Methods and systems for managing variable delays in packet transmission

ABSTRACT

An improved method and system for the determination of jitter buffers enables the generation of buffers having sizes and delays such that, as designed, the buffers capture a substantial majority of packets while not being resource intensive. The present methods and systems provide for improved jitter buffer management by deriving playout buffer adjustments from a plurality of variances, centered around a distribution peak, or mean average delay. The playout buffer monitor uses the buffer adjustments, in size and delay, to select, store and playout packets at their adjusted playout time. The present invention may be employed in a media gateway that enables data communications among heterogenous networks and may be specifically deployed to manage jitter experienced in the course of receiving packetized data and processing the data for further transmission through a packet-based or circuit-switched network.

RELATED APPLICATION DATA

[0001] This is a continuation-in-part of copending patent applicationSer. No. 10/004,753, for DISTRIBUTED PROCESSING ARCHITECTURE WITHSCALABLE PROCESSING LAYERS, filed Dec. 3, 2001.

FIELD OF THE INVENTION

[0002] The present invention relates generally to a method and systemfor the communication of digital signals, and more particularly to amethod and system for managing delays in packet transmission, e.g.managing jitter, using a buffering procedure, and to a media gatewaydeploying the jitter management methods and systems.

BACKGROUND OF THE INVENTION

[0003] Media communication devices comprise hardware and softwaresystems that utilize interdependent processes to enable the processingand transmission of analog and digital signals substantially seamlesslyacross and between circuit switched and packet switched networks. As anexample, a voice over packet gateway enables the transmission of humanvoice from a conventional public switched network to a packet switchednetwork, possibly traveling simultaneously over a single packet networkline with both fax information and modem data, and back again. Benefitsof unifying communication of different media across different networksinclude cost savings and the delivery of new and/or improvedcommunication services such as web-enabled call centers for improvedcustomer support and more efficient personal productivity tools.

[0004] Such media over packet communication devices (e.g., MediaGateways) require substantial processing power with sophisticatedsoftware controls and applications to enable the effective transmissionof data from circuit switched to packet switched networks and backagain. One form of media transmission, referred to as voice-over-IP(VoIP), is the transport of voice traffic through the use of theInternet protocol. VoIP requires notably less average bandwidth than atraditional circuit-switched connection for several reasons. First, bydetecting when voice activity is present, VoIP can choose to send littleor no data when a speaker on one end of a conversation is silent,whereas a conventional, circuit-switched telephone connection continuesto transmit during periods of silence. Second, the digital audio bitstream utilized by VoIP may be significantly compressed beforetransmission using a codec (compression/decompression) scheme. Usingcurrent technology, a telephone conversation that would require two 64kbps (one each way) channels over a circuit-switched network may utilizea data rate of roughly 8 kbps with VoIP.

[0005] In the transmission of digital data between a source and adestination apparatus, frequency distortion known as jitter may beintroduced. Jitter is the variable delay experienced in the course ofpacket transmission, resulting in varied packet arrival times, and iscaused by networks providing different waiting times for differentpackets or cells. It may also be caused by lack of synchronization,which results from mechanical or electrical changes. Given the real timenature of a live connection, jitter buffer management policies have alarge effect on the overall data quality. If the data is in the form ofa voice, actual sound losses range from a syllable to a word, dependingon how much data is in a given packet.

[0006] To rectify the problem of jitter, a receiver may include a bufferto store packets for an amount of time sufficient to allow sequenced,regular playout of the packets. However, an efficient technique isneeded to determine the receiver buffer playout length and timing inreal-time data communications such as VoIP. If the buffer delay orlength is too short, “slower” packets will not arrive before theirdesignated playout time and playout quality suffers. If the buffer delayis very long, it conspicuously disrupts interactive communications.Accurate knowledge of actual packet delays is necessary to determineoptimal packet buffer delay for real-time communications.

[0007] One approach to devising an appropriate buffer is to constructand maintain a distribution of the number of packets received by asystem over time, namely a histogram. A buffer may then be constructedby equating the buffer length to the entire length of the histogram andequating the buffer initiation point to the time when the first packetis received, e.g., the minimum delay.

[0008] Referring to FIG. 1a, a graph 100 a depicts a histogram 101 a ofa number of packets received relative to time. The x-axis 102 arepresents the delay experienced by packets and the y-axis 103 arepresents the number of packet samples received. The vertical bars 104a show the number of packets received in a defined span of time. A curve105 a connects the central point of tops of the bars 104 a of thehistogram 101 a. The curve 105 a depicts the distribution of the arrivaltime of packets. This curve is called the packet delay distribution(PDD) curve. Typically, in telecommunications applications, PDD curvesare often skewed earlier in time due to less delay experienced by mostof the packets and, therefore, are often not symmetrical around thepeak. One of ordinary skill in the art would be familiar with methods ofcreating histograms.

[0009] Despite existing jitter buffering methods, an improved method andsystem for playing out packets from media gateways by adaptivelyadjusting the buffer size delay is needed. More specifically, hardwareand software systems and methods are needed that can adaptivelydetermine the buffer size and the buffer initiation point while notbeing substantially resource intensive.

SUMMARY OF THE INVENTION

[0010] The present invention provides improved methods and systems forthe determination of jitter buffers. The present invention enables thegeneration of buffers having sizes and delays such that, as designed,the buffers capture a substantial majority of packets while not beingresource intensive.

[0011] In a first embodiment, a packet delay histogram is estimatedusing any one of several delay estimation techniques. The histogramrepresents the distribution of the number of packets received by asystem over a defined time. With the distribution in delay determined, aplayout delay evaluator calculates a plurality of variances, centeredaround a distribution peak, or mean average delay, and applies thosevariances to determine the buffer size and delay. The playout buffermonitor uses this calculated buffer size and delay to select, store andplayout packets at their adjusted playout time.

[0012] The present invention may be employed in a media gateway thatenables data communications among heterogeneous networks. Media gatewaysprovide media processing functions, data packet encapsulation, andmaintain a quality of service level, among other functions. When agateway operates as a receiver of voice data traffic, it buffers voicepackets and outputs a continuous digital or analog stream. The presentinvention may be deployed to manage jitter experienced in the course ofreceiving packetized data and processing the data for furthertransmission through a packet-based or circuit-switched network.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] These and other features and advantages of the present inventionwill be appreciated, as they become better understood by reference tothe following Detailed Description when considered in connection withthe accompanying drawings, wherein:

[0014]FIG. 1a is a histogram depicting packets received by a system overtime;

[0015]FIG. 1b is a block diagram of a system that employs a first-in,first-out (FIFO) buffer and a numerically controlled oscillator (NCO)for jitter correction;

[0016]FIG. 1c is a schematic waveform representation of jitter;

[0017]FIG. 1d is a diagram illustrating timings associated with thesending and receiving a packet;

[0018]FIG. 1e depicts a histogram calculation employed in one approachof designing a buffer;

[0019]FIG. 1f depicts a histogram calculation employed in a preferredembodiment of the present invention;

[0020]FIG. 1g is an embodiment of the adaptive playout-buffering processof the present invention;

[0021]FIG. 1h is an arrangement of a playout delay evaluator and buffermonitor used in the present invention;

[0022]FIG. 2a is a block diagram of a first embodiment of a hardwaresystem architecture for a media gateway;

[0023]FIG. 2b is a block diagram of a second embodiment of a hardwaresystem architecture for a media gateway;

[0024]FIG. 3 is a diagram of a packet having a header and user data;

[0025]FIG. 4 is a block diagram of a third embodiment of a hardwaresystem architecture for a media gateway;

[0026]FIG. 5 is a block diagram of one logical division of the softwaresystem of the present invention;

[0027]FIG. 6 is a block diagram of a first physical implementation ofthe software system of FIG. 5;

[0028]FIG. 7 is a block diagram of a second physical implementation ofthe software system of FIG. 5;

[0029]FIG. 8 is a block diagram of a third physical implementation ofthe software system of FIG. 5;

[0030]FIG. 9 is a block diagram of a first embodiment of the mediaengine component of the hardware system of the present invention;

[0031]FIG. 10 is a block diagram of a preferred embodiment of the medialayer component of the hardware system of the present invention;

[0032]FIG. 10a is a block diagram representation of a preferredarchitecture for the media layer component of the media engine of FIG.10;

[0033]FIG. 11 is a block diagram representation of a first preferredprocessing unit;

[0034]FIG. 12 is a time-based schematic of the pipeline processingconducted by the first preferred processing unit;

[0035]FIG. 13 is a block diagram representation of a second preferredprocessing unit;

[0036]FIG. 13a is a time-based schematic of the pipeline processingconducted by the second preferred processing unit;

[0037]FIG. 13b is a time-based schematic of the pipeline processingconducted by a series of processing units;

[0038]FIG. 14 is a block diagram representation of a preferredembodiment of the packet processor component of the hardware system ofthe present invention;

[0039]FIG. 15 is a schematic representation of one embodiment of theplurality of network interfaces in the packet processor component of thehardware system of the present invention;

[0040]FIG. 16 is a block diagram of a plurality of PCI interfaces usedto facilitate control and signaling functions for the packet processorcomponent of the hardware system of the present invention;

[0041]FIG. 17 is a first exemplary flow diagram of data communicatedbetween components of the software system of the present invention;

[0042]FIG. 17a is a second exemplary flow diagram of data communicatedbetween components of the software system of the present invention;

[0043]FIG. 18 is a schematic diagram of preferred components comprisingthe media processing subsystem of the software system of the presentinvention;

[0044]FIG. 19 is a schematic diagram of preferred components comprisingthe media processing subsystem of the software system of the presentinvention;

[0045]FIG. 20 is a schematic diagram of preferred components comprisingthe packetization processing subsystem of the software system of thepresent invention;

[0046]FIG. 21 is a schematic diagram of preferred components comprisingthe signaling subsystem of the software system of the present invention;

[0047]FIG. 22 is a block diagram of a host application operative on aphysical DSP; and

[0048]FIG. 23 is a block diagram of a host application operative on avirtual DSP.

DETAILED DESCRIPTION OF THE INVENTION

[0049] The present invention provides a method and system for jittermanagement using an adaptive buffer estimation procedure. One use of thepresent invention is as a novel media gateway, designed to enable thecommunication of media across circuit switched and packet switchednetworks, and encompasses novel hardware and software methods andsystems. The present invention will presently be described withreference to the aforementioned drawings. Headers will be used forpurposes of clarity and are not meant to limit or otherwise restrict thedisclosures made herein. It will further be appreciated, by thoseskilled in the art, that use of the term “media” is meant to broadlyencompass substantially all types of data that could be sent across apacket switched or circuit switched network, including, but not limitedto, voice, video, data, and fax traffic. Where arrows are utilized inthe drawings, it would be appreciated by one of ordinary skill in theart that the arrows represent the interconnection of elements and/orcomponents via buses or any other type of communication channel.

[0050] In one jitter management approach, a clock is derived from adigital data signal and the data signal is stored in a buffer. Thederived clock is input to an input counter, which counts a predeterminednumber of degrees out of phase with an output counter. For instance, theinput counter may be initialized 180 degrees out of phase with theoutput counter. When the input counter is at a maximum counter value,such as 31 in the case where the input counter contains 5 flip-flops,the output counter value is adjusted in accordance with the informationprocessed from a look-up table, preferably a read-only table. This tableoutputs a coefficient to a numerically controlled oscillator (NCO). TheNCO includes a low frequency portion that adds the coefficientsuccessively to itself and outputs a carry out (CO) signal. A highfrequency clock, around 100 MHz, is fed to the high frequency portion ofthe NCO, which preferably divides down the high frequency clock to aclock frequency that is centered at the desired output frequency. Thehigh frequency portion preferably includes an edge detect circuit thatreceives the CO signal and adjusts the frequency of the output clock toproduce a compensation clock. The compensation clock adjusts the outputcounter, which causes the output buffer to delay a packet of data for apre-determined amount of time, thereby outputting a digital signal thatis substantially free of jitter.

[0051] Referring to FIG. 1b, a block diagram of a system 100 b thatemploys a FIFO buffer 104 b and a numerically controlled oscillator(NCO) 107 b for jitter correction is provided. It includes an inputcounter 101 b, an output counter 102 b, an AND gate 103 b, a buffer 104b, a phase detection latch 105 b, a read only memory (ROM) 106 b, aninput data line 109 b, an output line 111 b producing jitter free data,a numerically controlled oscillator (NCO) 107 b, and a high frequencyclock 110 b in communication with the NCO 107 b. Input counter 101 b iscoupled to an input clock signal line 108 b.

[0052] Variation in packet delay is not a static process. As such,algorithmic approaches are required to estimate packet delay statisticswith time-based estimates such as packet mean arrival time and variancesfrom mean arrival time. Dynamic play-out delay adaptation algorithmsrely for their adaptive adjustments on the statistics obtained from thetimestamp and variable delay histories of the packets received. Suchinformation, such as timing and stream (continuous data packets after abreak) number information, may be gathered from streams of data, andfuture network delay values are predicted by constructing a measuredpacket-delay distribution curve. The system maintains a delay histogram,each storing the relative frequency with which a particular delay valueis expected to occur among the arriving packets. The histogram is thenused to approximate the distribution in the form of a curve.

[0053] Referring to FIG. 1c, jitter originates and propagates over anetwork in a digital signal. Waveform 101 c is the ideal communicationsignal and waveform 102 c is the signal with jitter. An unexpected delay103 c arises in the signal that may be due to queuing of packets atconnecting terminals. The delay 103 c escalates as the signal traversesthrough the network, resulting in delay 104 c. That variation in delay,calculated as the difference between 103 c and 104 c, is jitter and canincrease, decrease, or otherwise modify over time, causing continualvariations in the delay time.

[0054]FIG. 1d depicts the various timings associated with the sendingand receiving of packet i having data. The packet i is generated by thesending host at time 101 d represented by t_(i). The packet i isreceived at the receiving host at time 102 d represented by a_(i). Thepacket i is played out at the receiving host at time 103 d representedby p_(i). D_(prop) 104 d is the fixed propagation delay from the senderto the receiver, which is assumed to be constant, and set to be theminimum of the delay experienced by any packet. This delay 104 d isrevised each time a packet is received whose propagation delay is lesserthan D_(prop) 104 d and set equal to the propagation delay of thatpacket. The variable delay, v_(i), 106 d experienced by packet i as itis sent from the source to the destination host can be calculated asv_(i)=a_(i)−D_(prop). The amount of time, b_(i), 108 d that packet ispends in the buffer at the receiver awaiting its scheduled playout timecan be calculated as b_(i)=p_(i)−a_(i). The amount of time, d_(i), 112 dfrom when the ith packet is generated by the source until it is playedout at the destination host can be calculated as d_(i)=p_(i)−t_(i), andshall be referred to as the playout delay of packet i. The delay, n_(i),110 d introduced by the network can be calculated as n_(i)=a_(i)−t_(i).

[0055] To construct a histogram for determining the buffer size anddelay, packet delays need to be determined. A plurality of methods maybe used to calculate delay. In one approach, the jitter buffer systemincorporates a method that uses a linear recursive filter and ischaracterized by the weighting factor alpha. The delay estimate iscomputed as:

d _(i) =α*d _(I−1)+(1−α)*n _(i)

[0056] And the variation is computed as:

v _(i) =αv _(I−1)+(1−α)|d _(i) −n ₁|

[0057] where α is a weighting factor, d_(i) is the amount of time fromwhen the ith packet is generated by the source until it is played out atthe destination host, n_(i) is the total delay introduced by thenetwork, and v_(i) is the variable delay experienced by packet i as itis sent from the source to the destination host.

[0058] A second approach adapts more quickly to the short burst ofpackets incurring long delays by using a weighting mechanism whichincorporates two values into the weighting factor, one indicative ofincreasing trends in the delay and one indicative of decreasing trends.

[0059] if (n_(i)>d_(i)) then

d _(i) =β*d _(i)+(1−β)*n _(i)

[0060] else

d _(i) =α*d _(i)+(1−α)*n _(i)

[0061] A third approach calculates the delay estimate as:

d _(i)=min_(j) _(ε) _(Si) {n _(j)}

[0062] where S_(i) is the set of all packets received during the talkspurt prior to the one initiated by packet i.

[0063] A fourth approach adapts to sudden, large increases in theend-to-end network delay followed by a series of packets arriving almostsimultaneously, referred to herein as spikes. The detection of thebeginning of a spike is done by checking the delay between consecutivepackets at the receiver so that the delay is large enough for it toconstitute a spike. For example:

[0064] if (abs(n_(i)−n_(i−1))>spike_threshold)

[0065] mode=IMPULSE;

[0066] A variable var is employed with an exponentially decaying valuethat adjusts to the slope of spike. When this variable has a smallenough value, indicating that there is no longer a significant slope,the algorithm reverts back to normal mode. 1. n_(i) = Reciever_timestamp− Sender_timestamp; 2. if(mode = NORMAL) { if (abs(n_(i) − n_(i−1)) >abs(v) * 2 + 800){ var = 0; /* Detected beginnig of spike */ mode =IMPULSE; } else{ var = var/2 + abs((2n_(i) − n_(i−1) − n_(i−2))/8; if(var ≧ 63){ mode = NORMAL; /* End of spike */ n_(i−2) = n_(i−1); return;} } 3. if(mode = NORMAL) d_(i) = 0.125 * n_(i) + 0.875 * d_(i−1); elsed_(i) = d_(i −1) + n_(i) − n_(i−1); v_(i) = 0.125 * abs(n_(i) − d_(i)) +0.875 * v_(i−1); 4. n_(i−2) = n_(i−1); n_(i−1) = n_(i); return;

[0067] By calculating the packet delays as against the number of packetsreceived, a packet delay histogram may be constructed. The packet delayhistogram may be used to determine the required buffer size and delayby, for example, equating the buffer length to the length of thehistogram and the buffer delay to the minimum delay experienced by thereceived packets, represented by the first data points on the histogram.

[0068] Relying on an entire histogram for estimating the buffer size isresource intensive, however. It is preferred, rather, to use only themost important parts of the histogram for constructing the buffer, morespecifically to limit the buffer to times when a majority of packetsarrive. Therefore, once the histogram is estimated using a particularpacket delay calculation method, it is preferred to choose a portion ofthe histogram to enable the efficient determination of a buffer size anddelay.

[0069] One approach is to calculate the variance of the histogram,specifically the standard deviation around when the peak number ofpackets arrive, and add that variance to a minimum delay experienced bythe system. For example, if the variance is 60 ms and the minimum delayis 30 ms, then the buffer begins storing packets at 30 ms point andcontinues storing packets for 60 ms. To better correspond toexperimental conditions, the variance used to determine the bufferparameters can be a calculated variance derived by multiplying thevariance of the histogram by a multiplier (k).

[0070] Another approach is to define the selected histogram portion asthe variance around the peak of the histogram. The histogram peak may becalculated by computing the mean, or the average delay of the histogram.In calculating the peak, it is preferred to first eliminate a portion ofthe histogram tail to avoid having the trailing portion of the histogramexcessively skew the calculation. The average is then calculated andassociated with the peak. Using the peak, the variance of the histogrammay be calculated. Once the peak and variance of the histogram iscalculated, the buffer size of the histogram is obtained.

[0071] Preferably, the variance used to determine the buffer parametersis a calculated variance derived by multiplying the variance of thehistogram by a multiplier (k). For example, to capture packets aroundthe peak, the buffer size should preferably encompass aperiod=k*variance where k=2, thereby capturing packets within thevariance period before the peak and within the variance period after thepeak. The buffer initiation point, or minimum delay, is defined asminimum delay=mean−(k/2)*variance. For example, where the variance is 80ms and the mean is 150 ms, the buffer begins accepting packets at 70 msand continues accepting for another 160 ms, or up to 230 ms.

[0072] Referring to FIG. 1e, the graph represents histogram 101 e of apacket stream, specifically a depiction of the number of packetsreceived at different points in time by the system. The x-axis 102 erepresents the delay experienced by packets and the y-axis 103 erepresents the number of packet samples received. The vertical bars 104e show the number of packets received in a defined span of time. A curve105 e connects the central point of tops of the bars 104 e of thehistogram 101 e. The curve 105 e depicts the distribution of the arrivaltime of packets.

[0073] To avoid skewing the peak, or mean delay, calculation, the tailis eliminated at a defined point 106 e, which in this example is 270 mson the x-axis 102 e. Therefore, the histogram area to the right of point106 e is discarded. The mean of the curve 107 e may be calculated byusing the formula: $M = \frac{\Sigma \quad x_{i}}{N}$

[0074] where M is the mean, x_(i) represents the amount of delayexperienced by packets arriving in a particular window of time i, and Nis the total number of samples. Then the variance, Var, is calculated byusing the formula:${Var} = {{\frac{{\Sigma \left( {x_{i} - M} \right)}^{2}}{N}\quad {or}\quad {Var}} = \frac{\left. \Sigma \middle| {x_{i} - M} \right|}{N}}$

[0075] As shown in FIG. 1e, the mean is 150 ms and the variance is 90ms.

[0076] With the mean delay and variance having been calculated, thebuffer size may be defined as k*Var, where k can be any number, but ispreferably in the range of 2 to 8 and more preferably either 2, 4 or 8,and the buffer begins accepting packets at the point defined by

Initiation Point=M−(k/2)*Var

[0077] In the present example the initiation point equals 60 ms, k=2,and buffer size equals 180 ms. Thus, the buffer accepts packets from 60ms to 240 ms.

[0078] Referring to FIG. 1f, the graph represents histogram 101 f of apacket stream received by a system. The x-axis 102 f represents thedelay experienced by packets and the y-axis 103 f represents the numberof packet samples received. The vertical bars 104 f show the number ofpackets received in a defined span of time. A curve 105 f connects thecentral point of tops of the bars 104 f of the histogram 101 f. Thecurve 105 f depicts the distribution of the arrival time of packets.

[0079] As previously discussed, to avoid skewing the peak, or meandelay, calculation, the tail is eliminated at a defined point 106 f,which in this example is 270 ms on the x-axis 102 f. Therefore, thehistogram area to the right of point 106 f is discarded. The mean of thecurve 107 f may be calculated by using the formula$M = \frac{\Sigma \quad x_{i}}{N}$

[0080] where M is the mean, x₁ represents the amount of delayexperienced by packets arriving in a particular window of time i, and Nis the total number of samples.

[0081] Rather than determine a single variance for the histogram andutilize that single variance to calculate the buffer size and delay, thepreferred embodiment of the invention utilizes at least two separatelycalculated variances to better estimate the buffer size and delay basedupon the estimated histogram. Preferably, to calculate the plurality ofvariances, the histogram is conceptually divided into two portions, aportion encompassing the packets arriving after the mean delay and aportion encompassing packets that arrived prior to the mean delay. Wherei packets have been received and the mean delay is associated withpacket m, then the two histogram portions are defined by D₀ to D_(m−1)and the second defined by D_(m+1) to D₁, or the final packet. Thevariance of D₀ to D_(m−1), Var₁, may be calculated using the formula:${Var}_{1} = {{\frac{{\Sigma \left( {x_{j} - M} \right)}^{2}}{\left( N_{{0\quad {to}\quad m} - 1} \right)}\quad {or}\quad {Var}} = \frac{\left. \Sigma \middle| {x_{j} - M} \right|}{\left( N_{{0\quad {to}\quad m} - 1} \right)}}$

[0082] where j extends from 0 to m−1 and the total number of samplesincludes those samples from 0 to m−1. Similarly, the variance of D_(m+1)to D_(i), Var₂, may be calculated using the formula:${Var}_{2} = {{\frac{{\Sigma \left( {x_{j} - M} \right)}^{2}}{\left( N_{m + {1\quad {to}\quad i}} \right)}\quad {or}\quad {Var}} = \frac{\left. \Sigma \middle| {x_{j} - M} \right|}{\left( N_{m + {1\quad {to}\quad i}} \right)}}$

[0083] where j extends from m+1 to i and the total number of samplesincludes those sample from m+1 to i. Although the two separatelycalculated variances are calculated using one sample set of packetsarriving before the mean delay and one sample set of packets arrivingafter the mean delay, one would appreciate that the sample set ofpackets can be calculated using sample sets that overlap or that, whentaken together, comprise a subset of packets received.

[0084] Typically, the two variances are not equal because the histogramis asymmetrical. As shown in FIG. 1f, Var₁ 115 f is less than Var₂ 117f, reflective of the asymmetrical nature of the histogram and betterapproximating the actual distribution of packets received. This approachtherefore represents an improved approach to ascertaining the size andplacement of the buffer more accurately while optimizing computationalresources.

[0085] Optionally, Var₁ can be calculated from Var₂, or vice versa,using pre-defined equations. As an example, Var₁ could be a multiple orfactor of Var₂, i.e., Var₁*C Var₂, where C is a constant that isdetermined experimentally. Alternatively, Var₁ could be a fixed valuedepending on whether Var₂ exceeds or does not exceed certain thresholdvalue.

[0086] After the peak and variances are calculated, the buffer size andtiming can be determined. The buffer starts accepting packets at delayd, which is determined by subtracting Var₁ 115 f from the mean 107 f.

d=M−Var ₁

[0087] and continues accepting for a period (T) which is the sum of thetwo variances.

T=Var ₁ +Var ₂

[0088] For example, where the Var₁ is 60 ms, Var₂ is 105 ms and the meanis 150 ms, the buffer starts accepting packets at 90 ms and continuesaccepting for period T of 165 ms, or up to 255 ms. The variances used todetermine the buffer parameters can also be calculated variances derivedby multiplying Var₁ and/or Var₂ by a multiplier (k) where the multiplierany number, but preferably in the range of 2-7, and more preferablyaround 2, 4 or 8.

[0089]FIG. 1g depicts a block diagram of an adaptive process used forjitter correction using the above-described buffering method. The systemcomprises a sender 101 g and a receiver 102 g, which is comprised of asubtractor 103 g, a delay evaluator 104 g, a playout delay evaluator 106g, and a playout buffer monitor 107 g. After being properly delayed, thepacket is then sent to playout unit 112 g.

[0090] Packet i is sent from the sender 101 g with a timestamp t_(i) andreaches the receiver at time a_(i). Using the timestamp, the subtractor103 g subtracts a_(i) from t_(i) to produce the delay n_(i) for thepacket i. The delay evaluator 104 g analyzes this value and performs oneof the aforementioned delay evaluation techniques to generate thedistribution of delays that comprise a packet delay histogram. Theestimated packet delay histogram is communicated by the delay evaluator104 g to the playout delay evaluator 106 g which, based upon a portionof the communicated histogram, determines the size and delay of thebuffer employed by the playout buffer monitor 107 g. The receiver 102 g,in accordance with the adjusted playout time, outputs packets to theplayout unit 112 g for the final playout of the packet.

[0091] In an embodiment, upon determining mean delay and variance(s),delay smoothing is applied to the actual playout of packets by a delaysmoother. While mean delay and variance are used to determine acalculated playout time, the use of delay smoothing further controlschanges in playout time to specifically improve voice quality. Increasesin playout time are increased to larger steps while decreases in playouttime are limited to smaller steps. If the calculated playout time callsfor an increase in buffer delay, buffer delay is increased by an amountgreater than requested. If the calculated playout time calls for adecrease in buffer delay, buffer delay is decreased by an amount lessthan requested.

[0092] Referring to FIG. 1h, the playout delay evaluator 100 h andplayout buffer monitor 103 h are shown in communication with an outputdevice 114 h and data input 104 h. The playout delay evaluator 100 hpreferably comprises a control circuit 101 h and packet delaydistribution system 102 h for the calculation of buffer size and delaycharacteristics. The playout buffer monitor 103 h preferably comprises apacket data storage memory 112 h, buffer control circuit 107 h, delaytimer 108 h, pointer list 109 h, and input and output controllers 111 hand 113 h respectively. It also contains stream parameter block 105 hand drift control block 106 h. The calculation of the mean delay andvariances used to determine the buffer size and delay characteristicsmay be performed by the delay evaluator or by the playout delayevaluator 100 h, based upon data received from the delay evaluator.

[0093] Together with the packet delay distribution system, the controlcircuit 101 h manages the calculation, and communication of, a set ofbuffer configuration parameters for each data stream and allocatesbuffer resources for each stream. Control circuit 101 h calculates thebuffer size requirements for the stream using the packet size S(p), inbytes, and the packet rate T(r), e.g. one packet every 10 milliseconds.Dividing the buffer delay, BD, by the packet rate T(r) yields the numberof packets PS that the buffer needs to accommodate i.e., the number ofpacket slots in the buffer 103 h.

PS=BD/T(r)

[0094] The buffer size, S(B), is then the product of packet size S(p)and the number of packet slots PS.

S(B)=PS*S(p)

[0095] Control circuit 101 h allocates a block of memory 112 h havingS(B) bytes and a pointer list 109 h having PS slots for buffering eachstream. Control circuit 101 h also initializes buffer control circuits107 h for the stream. As shown in FIG. 1h, an input controller 111 h andan output controller 113 h are allocated to the buffer 103 h. Input andoutput controllers 111 h and 113 h transfer data between the data input104 h or output device 113 h, respectively, and the buffer memory 112 h.Buffer control 107 h contains all the logic circuits necessary tooversee operation of buffer 103 h and provide updated information tocontrol circuit 101 h.

[0096] Buffer control 107 h maintains a packet pointer for each datapacket stored in buffer 103 h. Each packet pointer contains the startingaddress of its respective packet contained in memory 112 h. The pointersare stored by buffer control 107 h in pointer list 109 h, which has afixed number of slots, equal to PS, for storing packet pointers. Buffercontrol 107 h manipulates pointer list 109 h as a shift register with PSslots, numbered 0 through PS-1. Slot 0 contains the pointer for thepacket, which is to be output next. The contents of each slot is shiftedinto the next adjacent slot towards the output slot 0, at the packetrate, namely, every T(r) seconds. The buffer delay of a packet isdetermined by the position of its pointer in the pointer list 109 h. Apacket whose pointer is in the 3^(rd) slot will experience a bufferdelay of 3*T(r) seconds.

[0097] As each packet is received by buffer circuit 103 h, the properlocation of storing the packet in the buffer memory 112 h is determinedby buffer control circuit 107 h, which passes a packet pointer, i.e., astarting address for the location in the memory where the packet datawill be stored, to input circuit 111 h. Input circuit 111 h stores thepacket data in memory starting at the pointer address as the data isreceived from network 104 h.

[0098] The starting address is also stored as a packet pointer in thepointer list 109 h at a slot location determined by the buffer controlcircuit 107 h. The pointers may be placed in the pointer list at slotlocations determined by the packet sequence. Thus, if packet i+2 isreceived after the first packet i, it is placed 2 slots higher in thelist than the present location of the pointer for packet i, providedthat packet I−2 is not earlier in the sequence than the packet lastoutput by output circuit 113 h. The use of packet sequence informationto select slot locations helps out of order packets to be re-orderedwithout moving packet data.

[0099] Control circuit 101 h checks the sequence number of each packetbeing received against the sequence number of the packet last output byoutput circuit 113 h. If the sequence number of the incoming packet islower than the packet last output by the buffer, the packet beingreceived is discarded because it has arrived to late to be output insequence. Buffer control 107 h maintains a last-played register to keeptrack of the last packet output for this purpose.

[0100] In response to a signal from timer 108 h, buffer control 107 hsends the pointer contents of the output slot 0 in the pointer list 108h to output control 113 h, which then moves the packet data, stored atthe respective memory location to the output device 114 h. With eachsignal from timer 108 h, buffer control 107 h also shifts each pointerdown one slot in the pointer list as described above. Normally, timer108 h is set to generate a signal at the packet rate, i.e., every T[r]seconds, to ensure that the playout rate for packets is same as thepacket rate.

[0101] The packet delay distribution system 102 h provides informationto the control circuit 101 h and buffer control 107 h concerning thedelay experienced by packets in the network. Also control circuit 101 hmay provide the feedback to reflect changing network operatingcharacteristics. Control circuit 101 h may also update the buffercharacteristics, i.e., buffer size and pointer list in response tochanging packet delay distribution.

[0102] If the rate of incoming packets is faster than the rate at whichthey are output by the output device 114 h, buffer overflow will result.Drift control 106 h maintains stream synchronization in the presence ofsuch clock drifts by discarding a packet periodically to prevent bufferoverflow. If the receiver clock is faster than the transmitter, driftcontrol circuit 106 h causes a packet to be repeated periodically oroutputs a blank or dummy packet so that the output device 114 h alwayshas a packet to process.

[0103] The jitter management method and system will be further describedin the context of an implementation within an exemplary application.

[0104] Exemplary Application

[0105] The present invention can be used to enable the operation of anovel media gateway. The hardware system architecture of the said novelgateway is comprised of a plurality of distributed processing layerprocessors, referred to as Media Engines, that are in communication witha data bus and interconnected with a Host Processor or a Packet Enginewhich, in turn, is in communication with interfaces to networks,preferably an asynchronous transfer mode (ATM) physical device orgigabit media independent interface (GMII) physical device.

[0106] Referring to FIG. 2a, a first embodiment of the top-levelhardware system architecture is shown. A data bus 205 a is connected tointerfaces 210 a existent on a first novel Media Engine Type I 215 a andon a second novel Media Engine Type I 220 a. The first novel MediaEngine Type I 215 a and second novel Media Engine Type I 220 a areconnected through a second set of communication buses 225 a to a novelPacket Engine 230 a which, in turn, is connected through interfaces 235a to outputs 240 a, 245 a. Preferably, each of the Media Engines Type I215 a, 220 a is in communication with a SRAM 246 a and SDRAM 247 a.

[0107] It is preferred that the data bus 205 a be a time-divisionmultiplex (TDM) bus. A TDM bus is a pathway for the transmission of anumber of separate voice, fax, modem, video, and/or other data signalssimultaneously over a single communication medium. The separate signalsare transmitted by interleaving a portion of each signal with eachother, thereby enabling one communications channel to handle multipleseparate transmissions and avoiding having to dedicate a separatecommunication channel to each transmission. Existing networks use TDM totransmit data from one communication device to another. It is furtherpreferred that the interfaces 210 a existent on the first novel MediaEngine Type I 215 a and second novel Media Engine Type I 220 a complywith H.100, a hardware specification that details the necessaryinformation to implement a CT bus interface at the physical layer forthe PCI computer chassis card slot, independent of softwarespecifications. The CT bus defines a single isochronous communicationsbus across certain PC chassis card slots and allows for the relativelyfluid inter-operation of components. It is appreciated that interfacesabiding by different hardware specifications could be used to receivesignals from the data bus 205 a.

[0108] As described below, each of the two novel Media Engines Type I215 a, 220 a can support a plurality of channels for processing media,such as voice. The specific number of channels supported is dependentupon the features required, such as the extent of echo cancellation, andtype of codec supported. For codecs having relatively low processingpower requirements, such as G.711, each Media Engine Type I 215 a, 220 acan support the processing of around 256 voice channels or more. EachMedia Engine Type I 215 a, 220 a is in communication with the PacketEngine 230 a through a communication bus 225 a, preferably a peripheralcomponent interconnect (PCI) communication bus. A PCI communication busserves to deliver control information and data transfers between theMedia Engine Type I chip 215 a, 220 a and the Packet Engine chip 230 a.Because Media Engine Type I 215 a, 220 a was designed to support theprocessing of lower data volumes, relative to Media Engine Type IIdescribed below, a single PCI communication bus can effectively supportthe transfer of both control and data between the designated chips. Itis appreciated, however, that where data traffic becomes too great, thePCI communication bus must be supplemented with a second inter-chipcommunication bus.

[0109] The Packet Engine 230 a receives processed data from each of thetwo Media Engines Type I 215 a, 220 a via the communication bus 225 a.While theoretically able to connect to a plurality of Media Engines TypeI, it is preferred that, for this embodiment, the Packet Engine 230 a bein communication with up to two Media Engines Type I 215 a, 220 a. Aswill be further described below, the Packet Engine 230 a provides celland packet encapsulation for data channels, at or around 2016 channelsin a preferred embodiment, quality of service functions for trafficmanagement, tagging for differentiated services and multi-protocol labelswitching, and the ability to bridge cell and packet networks. While itis preferred to use the Packet Engine 230 a, it can be replaced with adifferent host processor, provided that the host processor is capable ofperforming the above-described functions of the Packet Engine 230 a.

[0110] The Packet Engine 230 a is in communication with an ATM physicaldevice 240 a and GMII physical device 245 a. The ATM physical device 240a is capable of receiving processed and packetized data, as passed fromthe Media Engines Type I 215 a, 220 a through the Packet Engine 230 a,and transmitting it through a network operating on an asynchronoustransfer mode (an ATM network). As would be appreciated by one ofordinary skill in the art, an ATM network automatically adjusts thenetwork capacity to meet the system needs and can handle voice, modem,fax, video and other data signals. Each ATM data cell, or packet,consists of five octets of header field plus 48 octets for user data.The header contains data that identifies the related cell, a logicaladdress that identifies the routing, header error correction bits, plusbits for priority handling and network management functions. An ATMnetwork is a wideband, low delay, connection-oriented, packet-likeswitching and multiplexing network that allows for relatively flexibleuse of the transmission bandwidth. The GMII physical device 245 aoperates under a standard for the receipt and transmission of a certainamount of data, irrespective of the media types involved.

[0111] The embodiment shown in FIG. 2a can deliver voice processing upto Optical Carrier Level 1 (OC-1). OC-1 is designated at 51.840 millionbits per second and provides for the direct electrical-to-opticalmapping of the synchronous transport signal (STS-1) with framesynchronous scrambling. Higher optical carrier levels are directmultiples of OC-1, namely OC-3 is three times the rate of OC-1. As shownbelow, other configurations of the present invention could be used tosupport voice processing at OC-12.

[0112] Referring now to FIG. 2b, an embodiment supporting data rates upto OC-3 is shown, referred to herein as an OC-3 Tile 200 b. A data bus205 b is connected to interfaces 210 b existent on a first novel MediaEngine Type II 215 b and on a second novel Media Engine Type II 220 b.The first novel Media Engine Type II 215 b and second novel Media EngineType II 220 b are connected through a second set of communication buses225 b, 227 b to a novel Packet Engine 230 b which, in turn, is connectedthrough interfaces 260 b, 265 b to outputs 240 b, 245 b and throughinterface 250 b to a Host Processor 255 b.

[0113] As previously discussed, it is preferred that the data bus 205 bbe a time-division multiplex (TDM) bus and that the interfaces 210 bexistent on the first novel Media Engine Type II 215 b and second novelMedia Engine Type II 220 b comply with the H.100 a hardwarespecification. It is again appreciated that interfaces abiding bydifferent hardware specifications could be used to receive signals fromthe data bus 205 b.

[0114] Each of the two novel Media Engines Type II 215 b, 220 b cansupport a plurality of channels for processing media, such as voice. Thespecific number of channels supported is dependent upon the featuresrequired, such as the extent of echo cancellation, and type of codecimplemented. For codecs having relatively low processing powerrequirements, such as G.711, and where the extent of echo cancellationrequired is 128 milliseconds, each Media Engine Type II can support theprocessing of approximately 2016 channels of voice. With two MediaEngines Type II providing the processing power, this configuration iscapable of supporting data rates of OC-3. Where the Media Engines TypeII 215 b, 220 b are implementing a codec requiring higher processingpower, such as G.729A, the number of supported channels decreases. As anexample, the number of supported channels decreases from 2016 per MediaEngine Type II when supporting G.711 to approximately 672 to 1024channels when supporting G.729A. To match OC-3, an additional MediaEngine Type II can be connected to the Packet Engine 230 b via thecommon communication buses 225 b, 227 b.

[0115] Each Media Engine Type II 215 b, 220 b is in communication withthe Packet Engine 230 b through communication buses 225 b, 227 b,preferably a peripheral component interconnect (PCI) communication bus225 b and a UTOPIA II/POS II communication bus 227 b. As previouslymentioned, where data traffic volumes exceed a certain threshold, thePCI communication bus 225 b must be supplemented with a secondcommunication bus 227 b. Preferably, the second communication bus 227 bis a UTOPIA II/POS-II bus and serves as the data path between MediaEngines Type II 215 b, 220 b and the Packet Engine 230 b. A POS (Packetover SONET) bus represents a high-speed means for transmitting datathrough a direct connection, allowing the passing of data in its nativeformat without the addition of any significant level of overhead in theform of signaling and control information. UTOPIA (Universal Test andOperations Interface for ATM) refers to an electrical interface betweenthe transmission convergence and physical medium dependent sublayers ofthe physical layer and acts as the interface for devices connecting toan ATM network.

[0116] The physical interface is configured to operate in POS-II modewhich allows for variable size data frame transfers. Each packet istransferred using POS-II control signals to explicitly define the startand end of a packet. As shown in FIG. 3, each packet 300 contains aheader 305 with a plurality of information fields and user data 310.Preferably, each header 305 contains information fields including packettype 315 (e.g., RTP, raw encoded voice, AAL2), packet length 320 (totallength of the packet including information fields), and channelidentification 325 (identifies the physical channel, namely the TDM slotfor which the packet is intended or from which the packet came). Whendealing with encoded data transfers between a Media Engine Type II 215b, 220 b and Packet Engine 230 b, it is further preferred to includecoder/decoder type 330, sequence number 335, and voice activitydetection decision 340 in the header 305.

[0117] The Packet Engine 230 b is in communication with the HostProcessor 255 b through a PCI target interface 250 b. The Packet Engine230 b preferably includes a PCI to PCI bridge [not shown] between thePCI interface 226 b to the PCI communication bus 225 b and the PCItarget interface 250 b. The PCI to PCI bridge serves as a link forcommunicating messages between the Host Processor 255 b and two MediaEngines Type II 215 b, 220 b.

[0118] The novel Packet Engine 230 b receives processed data from eachof the two Media Engines Type II 215 b, 220 b via the communicationbuses 225 b, 227 b. While theoretically able to connect to a pluralityof Media Engines Type II, it is preferred that the Packet Engine 230 bbe in communication with no more than three Media Engines Type II 215 b,220 b [only two are shown in FIG. 2b]. As with the previously describedembodiment, Packet Engine 230 b provides cell and packet encapsulationfor data channels, up to 2048 channels when implementing a G.711 codec,quality of service functions for traffic management, tagging fordifferentiated services and multi-protocol label switching, and theability to bridge cell and packet networks. The Packet Engine 230 b isin communication with an ATM physical device 240 b and GMII physicaldevice 245 b through a UTOPIA II/POS II compatible interface 260 b andGMII compatible interface respectively 265 b. In addition to the GMIIinterface 265 b in the physical layer, referred to herein as the PHYGMII interface, the Packet Engine 230 b also preferably has another GMIIinterface [not shown] in the MAC layer of the network, referred toherein as the MAC GMII interface. MAC is a media specific access controlprotocol defining the lower half of the data link layer that definestopology dependent access control protocols for industry standard localarea network specifications.

[0119] As will be further discussed, the Packet Engine 230 b is designedto enable ATM-IP internetworking. Telecommunication service providershave built independent networks operating on an ATM or IP protocolbasis. Enabling ATM-IP internetworking permits service providers tosupport the delivery of substantially all digital services across asingle networking infrastructure, thereby reducing the complexitiesintroduced by having multiple technologies/protocols operativethroughout a service provider's entire network. The Packet Engine 230 bis therefore designed to enable a common network infrastructure byproviding for the internetworking between ATM modes and IP modes.

[0120] More specifically, the novel Packet Engine 230 b supports theinternetworking of ATM AALs (ATM Adaptation Layers) to specific IPprotocols. Divided into a convergence sublayer andsegmentation/reassembly sublayer, AAL accomplishes conversion from thehigher layer, native data format and service specifications into the ATMlayer. From the data originating source, the process includessegmentation of the original and larger set of data into the size andformat of an ATM cell, which comprises 48 octets of data payload and 5octets of overhead. On the receiving side, the AAL accomplishesreassembly of the data. AAL-1 functions in support of Class A trafficwhich is connection-oriented Constant Bit Rate (CBR), time-dependenttraffic, such as uncompressed, digitized voice and video, and which isstream-oriented and relatively intolerant of delay. AAL-2 functions insupport of Class B traffic which is connection-oriented Variable BitRate (VBR) isochronous traffic requiring relatively precise timingbetween source and sink, such as compressed voice and video. AAL-5functions in support of Class C traffic which is Variable Bit Rate (VBR)delay-tolerant connection-oriented data traffic requiring relativelyminimal sequencing or error detection support, such as signaling andcontrol data.

[0121] These ATM AALs are internetworked with protocols operative in anIP network, such as RTP, UDP, TCP and IP. Internet Protocol (IP)describes software that tracks the Internet's addresses for differentnodes, routes outgoing messages, and recognizes incoming messages whileallowing a data packet to traverse multiple networks from source todestination. Realtime Transport Protocol (RTP) is a standard forstreaming realtime multimedia over IP in packets and supports transportof real-time data like, such as interactive video and video over packetswitched networks. Transmission Control Protocol (TCP) is a transportlayer, connection oriented, end-to-end protocol that provides relativelyreliable, sequenced, and unduplicated delivery of bytes to a remote or alocal user. User Datagram Protocol (UDP) provides for the exchange ofdatagrams without acknowledgements or guaranteed delivery and is atransport layer, connectionless mode protocol. In the preferredembodiment represented in FIG. 2, it is preferred that ATM AAL-1 beinternetworked with RTP, UDP, and IP protocols, AAL-2 be internetworkedwith UDP and IP protocols, and AAL-5 be internetworked with UDP and IPprotocols or TCP and IP protocols.

[0122] Multiple OC-3 tiles, as presented in FIG. 2b, can beinterconnected to form a tile supporting higher data rates. As shown inFIG. 4, four OC-3 tiles 405 can be interconnected, or “daisy chained”,together to form an OC-12 tile 400. Daisy chaining is a method ofconnecting devices in a series such that signals are passed through thechain from one device to the next. By enabling daisy chaining, thepresent invention provides for currently unavailable levels ofscalability in data volume support and hardware implementation. A HostProcessor 455 is connected via communication buses 425, preferably PCIcommunication buses, to the PCI interface 435 on each of the OC-3 tiles405. Each OC-3 tile 405 has a TDM interface 460 that operates via a TDMcommunication bus 465 to receive TDM signals via a TDM interface [notshown]. Each OC-3 tile 405 is further in communication with an ATMphysical device 490 through a communication bus 495 connected to theOC-3 tile 405 through a UTOPIA II/POS II interface 470. Data received byan OC-3 tile 405 and not processed, because, for example, the datapacket is directed toward a specific packet engine address that was notfound in that specific OC-3 tile 405, is sent to the next OC-3 tile 405in the series via the PHY GMII interface 410 and received by the nextOC-3 tile via the MAC GMII interface 413. Enabling daisy chainingeliminates the need for an external aggregator to interface the GMIIinterfaces on each of the OC-3 tiles in order to enable integration. Thefinal OC-3 tile 405 is in communication with a GMII physical device 417via the PHY GMII interface 410.

[0123] Operating on the above-described hardware architectureembodiments is a plurality of novel, integrated software systemsdesigned to enable media processing, signaling, and packet processing.Referring now to FIG. 5, a logical division of the software system 500is shown. The software system 500 is divided into three subsystems, aMedia Processing Subsystem 505, a Packetization Subsystem 540, and aSignaling/Management Subsystem 570. Each subsystem 505, 540, 570 furthercomprises a series of modules 520 designed to perform different tasks inorder to effectuate the processing and transmission of media. It ispreferred that the modules 520 be designed in order to encompass asingle core task that is substantially non-divisible. For example,exemplary modules include echo cancellation, codec implementation,scheduling, IP-based packetization, and ATM-based packetization, amongothers. The nature and functionality of the modules 520 deployed in thepresent invention will be further described below.

[0124] The logical system of FIG. 5 can be physically deployed in anumber of ways, depending on processing needs, due, in part, to thenovel software architecture, to be described below. As shown in FIG. 6,one physical embodiment of the software system described in FIG. 5 is tobe on a single chip 600, where the media processing block 610,packetization block 620, and management block 630 are all operative onthe same chip. If processing needs increase, thereby requiring more chippower be dedicated to media processing, the software system can bephysically implemented such that the media processing block 710 andpacketization block 720 operate on a DSP 715 that is in communicationvia a data bus 770 with the management block 730 that operates on aseparate host processor 735, as depicted in FIG. 7. Similarly, ifprocessing needs further increase, the media processing block 810 andpacketization block 820 can be implemented on separate DSPs 860, 865 andcommunicate via data buses 870 with each other and with the managementblock 830 that operates on a separate host processor 835, as depicted inFIG. 8. Within each block, the modules can be physically separated ontodifferent processors to enable for a high degree of system scalability.

[0125] In an embodiment, four OC-3 tiles are combined onto a singleintegrated circuit (IC) card wherein each OC-3 tile is configured toperform media processing and packetization tasks. The IC card has fourOC-3 tiles in communication via databuses. As previously described, theOC-3 tiles each have three Media Engine II processors in communicationvia interchip communication buses with a Packet Engine processor. ThePacket Engine processor has a MAC and PHY interface by whichcommunications external to the OC-3 tiles are performed. The PHYinterface of the first OC-3 tile is in communication with the MACinterface of the second OC-3 tile. Similarly, the PHY interface of thesecond OC-3 tile is in communication with the MAC interface of the thirdOC-3 tile and the PHY interface of the third OC-3 tile is incommunication with the MAC interface of the fourth OC-3 tile. The MACinterface of the first OC-3 tile is in communication with the PHYinterface of a host processor. Operationally, each Media Engine IIprocessor implements the Media Processing Subsystem of the presentinvention, shown in FIG. 5 as 505. Each Packet Engine processorimplements the Packetization Subsystem of the present invention, shownin FIG. 5 as 540. The host processor implements the ManagementSubsystem, shown in FIG. 5 as 570.

[0126] The primary components of the top-level hardware systemarchitecture will now be described in further detail, including MediaEngine Type I, Media Engine Type II, and Packet Engine. Additionally,the software architecture, along with specific features, will be furtherdescribed in detail.

[0127] Media Engines

[0128] Both Media Engine I and Media Engine II are types of DPLPs andtherefore comprise a layered architecture wherein each layer encodes anddecodes up to N channels of voice, fax, modem, or other data dependingon the layer configuration. Each layer implements a set of pipelinedprocessing units specially designed through substantially optimalhardware and software partitioning to perform specific media processingfunctions. The processing units are special-purpose digital signalprocessors that are each optimized to perform a particular signalprocessing function or a class of functions. By creating processingunits that are capable of performing a well-defined class of functions,such as echo cancellation or codec implementation, and placing them in apipeline structure, the present invention provides a media processingsystem and method with substantially greater performance thanconventional approaches.

[0129] Referring to FIG. 9, a diagram of Media Engine 1900 is shown.Media Engine 1900 comprises a plurality of Media Layers 905 each incommunication with a central direct memory access (DMA) controller 910via communication data buses 920. Using a DMA approach enables thebypassing of a system processing unit to handle the transfer of databetween itself and system memory directly. Each Media Layer 905 furthercomprises an interface to the DMA 925 interconnected with thecommunication data buses 920. In turn, the DMA interface 925 is incommunication with each of a plurality of pipelined processing units(PUs) 930 via communication data buses 920 and a plurality of programand data memories 940, via communication data buses 920, that aresituated between the DMA interface 925 and each of the PUs 930. Theprogram and data memories 940 are also in communication with each of thePUs 930 via data buses 920. Preferably, each PU 930 can access at leastone program memory and at least one data memory unit 940. Further, it isalso preferred to have at least one first-in, first-out (FIFO) taskqueue [not shown] to receive scheduled tasks and queue them foroperation by the PUs 930.

[0130] While the layered architecture of the present invention is notlimited to a specific number of Media Layers, certain practicallimitations may restrict the number of Media Layers that can be stackedinto a single Media Engine I. As the number of Media Layers increase,the memory and device input/output bandwidth may increase to such anextent that the memory requirements, pin count, density, and powerconsumption are adversely affected and become incompatible withapplication or economic requirements. Those practical limitations,however, do not represent restrictions on the scope and substance of thepresent invention.

[0131] Media Layers 905 are in communication with an interface to thecentral processing unit 950 (CPU IF) through communication buses 920.The CPU IF 950 transmits and receives control signals and data from anexternal scheduler 955, the DMA controller 910, a PCI interface (PCI IF)960, a SRAM interface (SRAM IF) 975, and an interface to an externalmemory, such as an SDRAM interface (SDRAM IF) 970 through communicationbuses 920. The PCI IF 960 is preferably used for control signals. TheSDRAM IF 970 connects to a synchronized dynamic random access memorymodule whereby the memory access cycles are synchronized with the CPUclock in order to eliminate wait time associated with memory fetchingbetween random access memory (RAM) and the CPU. In a preferredembodiment, the SDRAM IF 970 that connects the processor with the SDRAMsupports 133 MHz synchronous DRAM and asynchronous memory. It supportsone bank of SDRAM (64 Mbit/256 Mbit to 256 MB maximum) and 4asynchronous devices (8/16/32 bit) with a data path of 32 bits and fixedlength as well as undefined length block transfers and accommodatesback-to-back transfers. Eight transactions may be queued for operation.The SDRAM [not shown] contains the states of the PUs 930. One ofordinary skill in the art would appreciate that, although not preferred,other external memory configurations and types could be selected inplace of the SDRAM and, therefore, that another type of memory interfacecould be used in place of the SDRAM IF 970.

[0132] The SDRAM IF 970 is further in communication with the PCI IF 960,DMA controller 910, the CPU IF 950, and, preferably, the SRAM interface(SRAM IF) 975 through communication buses 920. The SRAM [not shown] is astatic random access memory that is a form of random access memory thatretains data without constant refreshing, offering relatively fastmemory access. The SRAM IF 975 is also in communication with a TDMinterface (TDM IF) 980, the CPU IF 950, the DMA controller 910, and thePCI IF 960 via data buses 920.

[0133] In an embodiment, the TDM IF 980 for the trunk side is preferablyH.100/H.110 compatible and the TDM bus 981 operates at 8.192 MHz.Enabling the Media Engine I 900 to provide 8 data signals, thereforedelivering a capacity up to 512 full duplex channels, the TDM IF 980 hasthe following preferred features: a H.100/H.110 compatible slave, framesize can be set to 16 or 20 samples and the scheduler can program theTDM IF 980 to store a specific buffer or frame size, programmablestaggering points for the maximum number of channels. Preferably, theTDM IF interrupts the scheduler after every N samples of 8,000 Hz clockwith the number N being programmable with possible values of 2, 4, 6,and 8. In a voice application, the TDM IF 980 preferably does nottransfer the pulse code modulation (PCM) data to memory on asample-by-sample basis, but rather buffers 16 or 20 samples, dependingon the frame size which the encoders and decoders are using, of achannel and then transfers the voice data for that channel to memory.

[0134] The PCI IF 960 is also in communication with the DMA controller910 via communication buses 920. External connections compriseconnections between the TDM IF 980 and a TDM bus 981, between the SRAMIF 975 and a SRAM bus 976, between the SDRAM IF 970 and a SDRAM bus 971,preferably operating at 32 bit@133 MHz, and between the PCI IF 960 and aPCI 2.1 Bus 961 also preferably operating at 32 bit@133 MHz.

[0135] External to Media Engine I, the scheduler 955 maps the channelsto the Media Layers 905 for processing. When the scheduler 955 isprocessing a new channel, it assigns the channel to one of the layers,depending upon processing resources available per layer 905. Each layer905 handles the processing of a plurality of channels such that theprocessing is performed in parallel and is divided into fixed frames, orportions of data. The scheduler 955 communicates with each Media Layer905 through the transmission of data, in the form of tasks, to the FIFOtask queues wherein each task is a request to the Media Layer 905 toprocess a plurality of data portions for a particular channel. It istherefore preferred for the scheduler 955 to initiate the processing ofdata from a channel by putting a task in a task queue, rather thanprogramming each PU 930 individually. More specifically, it is preferredto have the scheduler 955 initiate the processing of data from a channelby putting a task in the task queue of a particular PU 930 and havingthe Media Layer's 905 pipeline architecture manage the data flow tosubsequent PUs 930.

[0136] The scheduler 955 should manage the rate by which each of thechannels is processed. In an embodiment where the Media Layer 905 isrequired to accept the processing of data from M channels and each ofthe channels uses a frame size of T msec, then it is preferred that thescheduler 955 processes one frame of each of the M channels within eachT msec interval. Further, in a preferred embodiment, the scheduling isbased upon periodic interrupts, in the form of units of samples, fromthe TDM IF 980. As an example, if the interrupt period is 2 samples thenit is preferred that the TDM IF 980 interrupts the scheduler every timeit gathers two new samples of all channels. The scheduler preferablymaintains a ‘tick-count’, which is incremented on every interrupt andreset to 0 when time equal to a frame size has passed. The mapping ofchannels to time slots is preferably not fixed. For example, in voiceapplications, whenever a call starts on a channel, the schedulerdynamically assigns a layer to a provisioned time slot channel. It isfurther preferred that the data transfer from a TDM buffer to the memoryis aligned with the time slot in which this data is processed, therebystaggering the data transfer for different channels from TDM to memory,and vice-versa, in a manner that is equivalent to the staggering of theprocessing of different channels. Consequently, it is further preferredthat the TDM IF 980 maintains a tick count variable wherein there issome synchronization between the tick counts of TDM and scheduler 955.In the exemplary embodiment described above, the tick count variable isset to zero on every 2 ms or 2.5 ms depending on the buffer size.

[0137] Referring to FIG. 10, a block diagram of Media Engine II 1000 isshown. Media Engine II 1000 comprises a plurality of Media Layers 1005each in communication with processing layer controller 1007, referred toherein as a Media Layer Controller 1007, and central direct memoryaccess (DMA) controller 1010 via communication data buses and aninterface 1015. Each Media Layer 1005 is in communication with a CPUinterface 1006 which, in turn, is in communication with a CPU 1004.Within each Media Layer 1005, a plurality of pipelined processing units(PUs) 1030 are in communication with a plurality of program memories1035 and data memories 1040, via communication data buses. Preferably,each PU 1030 can access at least one program memory 1035 and one datamemory 1040. Each of the PUs 1030, program memories 1035, and datamemories 1040 is in communication with an external memory 1047 via theMedia Layer Controller 1007 and DMA 1010. In a preferred embodiment,each Media Layer 1005 comprises four PUs 1030, each of which is incommunication with a single program memory 1035 and data memory 1040,wherein the each of the PUs 1031, 1032, 1033, 1034 is in communicationwith each of the other PUs 1031, 1032, 1033, 1034 in the Media Layer1005.

[0138] Shown in FIG. 10a, a preferred embodiment of the architecture ofthe Media Layer Controller, or MLC, is provided. A program memory 1005a, preferably 512×64, operates in conjunction with a controller 1010 aand data memory 1015 a to deliver data and instructions to a dataregister file 1017 a, preferably 16×32, and address register file 1020a, preferably 4×12. The data register file 1017 a and address registerfile 1020 a are in communication with functional units such as anadder/MAC 1025 a, logical unit 1027 a, and barrel shifter 1030 a andwith units such as a request arbitration logic unit 1033 a and DMAchannel bank 1035 a.

[0139] Referring back to FIG. 10, the MLC 1007 arbitrates data andprogram code transfer requests to and from the program memories 1035 anddata memories 1040 in a round robin fashion. On the basis of thisarbitration the MLC 1007 fills the data pathways that define how unitsdirectly access memory, namely the DMA channels [not shown]. The MLC1007 is capable of performing instruction decoding to route aninstruction according to its dataflow and keep track of the requeststates for all PUs 1030, such as the state of a read-in request, awrite-back request and an instruction forwarding. The MLC 1007 isfurther capable of conducting interface related functions, such asprogramming DMA channels, starting signal generation, maintaining pagestates for PUs 1030 in each Media Layer 1005, decoding of schedulerinstructions, and managing the movement of data from and into the taskqueues of each PU 1030. By performing the aforementioned functions, theMedia Layer Controller 1007 substantially eliminates the need forassociating complex state machines with the PUs 1030 present in eachMedia Layer 1005.

[0140] The DMA controller 1010 is a multi-channel DMA unit for handlingthe data transfers between the local memory buffer PUs and externalmemories, such as the SDRAM. Preferably, DMA channels are programmeddynamically. More specifically, PUs 1030 generate independent requests,each having an associated priority level, and send them to the MLC 1007for reading or writing. Based upon the priority request delivered by aparticular PU 1030, the MLC 1007 programs the DMA channel accordingly.Preferably, there is also an arbitration process, such as a single levelof round robin arbitration, between the channels within the DMA toaccess the external memory. The DMA Controller 1010 provides hardwaresupport for round robin request arbitration across the PUs 1030 andMedia Layers 1005.

[0141] In an exemplary operation, it is preferred to conduct transfersbetween local PU memories and external memories by utilizing the addressof the local memory, address of the external memory, size of thetransfer, direction of the transfer, namely whether the DMA channel istransferring data to the local memory from the external memory orvice-versa, and how many transfers are required for each PU. In thispreferred embodiment, a DMA channel is generated and receives thisinformation from 2, 32 bit registers residing in the DMA. A thirdregister exchanges control information between the DMA and each PU whichcontains the current status of the DMA transfer. In a preferredembodiment, arbitration is performed among the following requests: 1structure read, 4 data read and 4 data write requests from each MediaLayer, approximately 90 data requests in total, and 4 program code fetchrequests from each Media Layer, approximately 40 program code fetchrequests in total. The DMA Controller 1010 is preferably further capableof arbitrating priority for program code fetch requests, conducting linklist traversal and DMA channel information generation, and performingDMA channel prefetch and done signal generation.

[0142] The MLC 1007 and DMA Controller 1010 are in communication with aCPU IF 1006 through communication buses. The PCI IF 1060 is incommunication with an external memory interface (such as a SDRAM IF)1070 and with the CPU IF 1006 via communication buses. The externalmemory interface 1070 is further in communication with the MLC 1007 andDMA Controller 1010 and a TDM IF 1080 through communication buses. TheSDRAM IF 1070 is in communication with a packet processor interface,such as a UTOPIA II/POS compatible interface (U2/POS IF), 1090 viacommunication data buses. The U2/POS IF 1090 is also preferably incommunication with the CPU IF 1006. Although the preferred embodimentsof the PCI IF and SDRAM IF are similar to Media Engine I, it ispreferred that the TDM IF 1080 have all 32 serial data signalsimplemented, thereby supporting at least 2048 full duplex channels.External connections comprise connections between the TDM IF 1080 and aTDM bus 1081, between the external memory 1070 and a memory bus 1071,preferably operating at 64 bit@133 MHz, between the PCI IF 1060 and aPCI 2.1 Bus 1061 also preferably operating at 32 bit@133 MHz, andbetween the U2/POS IF 1090 and a UTOPIA II/POS connection 1091preferably operative at 622 megabits per second. In a preferredembodiment, the TDM IF 1080 for the trunk side is preferably H.100/H.110compatible and the TDM bus 1081 operates at 8.192 MHz, as previouslydiscussed in relation to the Media Engine I.

[0143] For both Media Engine I and Media Engine II, within each medialayer, the present invention utilizes a plurality of pipelined PUsspecially designed for conducting a defined set of processing tasks. Inthat regard, the PUs are not general purpose processors and can not beused to conduct any processing task. A survey and analysis of specificprocessing tasks yielded certain functional unit commonalities that,when combined, yield a specialized PU capable of optimally processingthe universe of those specialized processing tasks. The instruction setarchitecture of each PU yields compact code. Increased code densityresults in a decrease in required memory and, consequently, a decreasein required area, power, and memory traffic.

[0144] The pipeline architecture also improves performance. Pipeliningis an implementation technique whereby multiple instructions areoverlapped in execution. In a computer pipeline, each step in thepipeline completes a part of an instruction. Like an assembly line,different steps are completing different parts of different instructionsin parallel. Each of these steps is called a pipe stage or a datasegment. The stages are connected on to the next to form a pipe. Withina processor, instructions enter the pipe at one end, progress throughthe stages, and exit at the other end. The throughput of an instructionpipeline is determined by how often an instruction exits the pipeline.

[0145] More specifically, one type of PU (referred to herein as EC PU)has been specially designed to perform, in a pipeline architecture, aplurality of media processing functions, such as echo cancellation (EC),voice activity detection (VAD), and tone signaling (TS) functions. Echocancellation removes from a signal echoes that may arise as a result ofthe reflection and/or retransmission of modified input signals back tothe originator of the input signals. Commonly, echoes occur when signalsthat were emitted from a loudspeaker are then received and retransmittedthrough a microphone (acoustic echo) or when reflections of a far endsignal are generated in the course of transmission along hybrids wires(line echo). Although undesirable, echo is tolerable in a telephonesystem, provided that the time delay in the echo path is relativelyshort. However, longer echo delays can be distracting or confusing to afar end speaker. Voice activity detection determines whether ameaningful signal or noise is present at the input. Tone signalingcomprises the processing of supervisory, address, and alerting signalsover a circuit or network by means of tones. Supervising signals monitorthe status of a line or circuit to determine if it is busy, idle, orrequesting service. Alerting signals indicate the arrival of an incomingcall. Addressing signals comprise routing and destination information.

[0146] The LEC, VAD, and TS functions can be efficiently executed usinga PU having several single-cycle multiply and accumulate (MAC) unitsoperating with an Address Generation Unit and an Instruction Decoder.Each MAC unit includes a compressor, sum and carry registers, an adder,and a saturation and rounding logic unit. In a preferred embodiment,shown in FIG. 11, this PU 1100 comprises a load store architecture witha single Address Generation Unit (AGU) 1105, supporting zero over-headlooping and branching with delay slots, and an Instruction Decoder 1106.The plurality of MAC units 1110 operate in parallel on two 16-bitoperands and perform the following function:

Acc+=a*b

[0147] Guard bits are appended with sum and carry registers tofacilitate repeated MAC operations. A scale unit prevents accumulatoroverflow. Each MAC unit 1110 may be programmed to perform roundoperations automatically. Additionally, it is preferred to have anaddition/subtraction unit [not shown] as a conditional sum adder withboth the input operands being 20 bit values and the output operand beinga 16-bit value.

[0148] Operationally, the EC PU performs tasks in a pipeline fashion. Afirst pipeline stage comprises an instruction fetch wherein instructionsare fetched into an instruction register from program memory. A secondpipeline stage comprises an instruction decode and operand fetch whereinan instruction is decoded and stored in a decode register. The hardwareloop machine is initialized in this cycle. Operands from the dataregister files are stored in operand registers. The AGU operates duringthis cycle. The address is placed on data memory address bus. In thecase of a store operation, data is also placed on the data memory databus. For post increment or decrement instructions, the address isincremented or decremented after being placed on the address bus. Theresult is written back to address register file. The third pipelinestage, the Execute stage, comprises the operation on the fetchedoperands by the Addition/Subtraction Unit and MAC units. The statusregister is updated and the computed result or data loaded from memoryis stored in the data/address register files. The states and historyinformation required for the EC PU operations are fetched through amulti-channel DMA interface, as previously shown in each Media Layer.The EC PU configures the DMA controller registers directly. The EC PUloads the DMA chain pointer with the memory location of the head of thechain link.

[0149] By enabling different data streams to move through the pipelinedstages concurrently, the EC PU reduces wait time for processing incomingmedia, such as voice. Referring to FIG. 12, in time slot 1 1205, aninstruction fetch task (IF) is performed for processing data fromchannel 1 1250. In time slot 2 1206, the IF task is performed forprocessing data from channel 2 1255 while, concurrently, an instructiondecode and operand fetch (IDOF) is performed for processing data fromchannel 1 1250. In time slot 3 1207, an IF task is performed forprocessing data from channel 3 1260 while, concurrently, an instructiondecode and operand fetch (IDOF) is performed for processing data fromchannel 2 1255 and an Execute (EX) task is performed for processing datafrom channel 1 1250. One of ordinary skill in the art would appreciatethat, because channels are dynamically generated, the channel numberingmay not reflect the actual location and assignment of a task. Channelnumbering here is used to simply indicate the concept of pipeliningacross multiple channels and not to represent actual task locations.

[0150] A second type of PU (referred to herein as CODEC PU) has beenspecially designed to perform, in a pipeline architecture, a pluralityof media processing functions, such as encoding and decoding signals inaccordance with certain standards and protocols, including standardspromoted by the International Telecommunication Union (ITU) such asvoice standards, including G.711, G.723.1, G.726, G.728, G.729A/B/E, anddata modem standards, including V.17, V.34, and V.90, among others(referred to herein as Codecs), and performing comfort noise generation(CNG) and discontinuous transmission (DTX) functions. The various Codecsare used to encode and decode voice signals with differing degrees ofcomplexity and resulting quality. CNG is the generation of backgroundnoise that gives users a sense that the connection is live and notbroken. A DTX function is implemented when the frame being receivedcomprises silence, rather than a voice transmission.

[0151] The Codecs, CNG, and DTX functions can be efficiently executedusing a PU having an Arithmetic and Logic Unit (ALU), MAC unit, BarrelShifter, and Normalization Unit. In a preferred embodiment, shown inFIG. 13, the CODEC PU 1300 comprises a load store architecture with asingle Address Generation Unit (AGU) 1305, supporting zero over-headlooping and zero overhead branching with delay slots, and an InstructionDecoder 1306.

[0152] In an exemplary embodiment, each MAC unit 1310 includes acompressor, sum and carry registers, an adder, and a saturation androunding logic unit. The MAC unit 1310 is implemented as a compressorwith feedback into the compression tree for accumulation. One preferredembodiment of a MAC 1310 has a latency of approximately 2 cycles with athroughput of 1 cycle. The MAC 1310 operates on two 17-bit operands,signed or unsigned. The intermediate results are kept in sum and carryregisters. Guard bits are appended to the sum and carry registers forrepeated MAC operations. The saturation logic converts the Sum and Carryresults to 32 bit values. The rounding logic rounds a 32 bit to a 16 bitnumber. Division logic is also implemented in the MAC unit 1310.

[0153] In an exemplary embodiment, the ALU 1320 includes a 32 bit adderand a 32 bit logic circuit capable of performing a plurality ofoperations, including add, add with carry, subtract, subtract withborrow, negate, AND, OR, XOR, and NOT. One of the inputs to the ALU 1320has an XOR array, which operates on 32-bit operands. Comprising anabsolute unit, a logic unit, and an addition/subtraction unit, the ALU's1320 absolute unit drives this array. Depending on the output of theabsolute unit, the input operand is either XORed with one or zero toperform negation on the input operands.

[0154] In an exemplary embodiment, the Barrel Shifter 1330 is placed inseries with the ALU 1320 and acts as a pre-shifter to operands requiringa shift operation followed by any ALU operations. One type of preferredBarrel Shifter can perform a maximum of 9-bit left or 26-bit rightarithmetic shifts on 16-bit or 32-bit operands. The output of the BarrelShifter is a 32-bit value, which is accessible to both the inputs of theALU 1320.

[0155] In an exemplary embodiment, the Normalization unit 1340 countsthe redundant sign bits in the number. It operates on 2's complement16-bit numbers. Negative numbers are inverted to compute the redundantsign bits. The number to be normalized is fed into the XOR array. Theother input comes from the sign bit of the number. Where the media beingprocessed is voice, it is preferred to have an interface to the EC PU.The EC PU uses VAD to determine whether a frame being received comprisessilence or speech. The VAD decision is preferably communicated to theCODEC PU so that it may determine whether to implement a Codec or DTXfunction.

[0156] Operationally, the CODEC PU performs tasks in a pipeline fashion.A first pipeline stage comprises an instruction fetch whereininstructions are fetched into an instruction register from programmemory. At the same time, the next program counter value is computed andstored in the program counter. In addition, loop and branch decisionsare taken in the same cycle. A second pipeline stage comprises aninstruction decode and operand fetch wherein an instruction is decodedand stored in a decode register. The instruction decode, register readand branch decisions happen in the instruction decode stage. In thethird pipeline stage, the Execute 1 stage, the Barrel Shifter and theMAC compressor tree complete their computation. Addresses to data memoryare also applied in this stage. In the fourth pipeline stage, theExecute 2 stage, the ALU, normalization unit, and the MAC adder completetheir computation. Register write-back and address registers are updatedat the end of the Execute-2 stage. The states and history informationrequired for the CODEC PU operations are fetched through a multi-channelDMA interface, as previously shown in each Media Layer.

[0157] By enabling different data streams to move through the pipelinedstages concurrently, the CODEC PU reduces wait time for processingincoming media, such as voice. Referring to FIG. 13a, in time slot 11305 a, an instruction fetch task (IF) is performed for processing datafrom channel 1 1350 a. In time slot 2 1306 a, the IF task is performedfor processing data from channel 2 1355 a while, concurrently, aninstruction decode and operand fetch (IDOF) is performed for processingdata from channel 1 1350 a. In time slot 3 1307 a, an IF task isperformed for processing data from channel 3 1360 a while, concurrently,an instruction decode and operand fetch (IDOF) is performed forprocessing data from channel 2 1355 a and an Execute 1 (EX1) task isperformed for processing data from channel 1 1350 a. In time slot 4 1308a, an IF task is performed for processing data from channel 4 1370 awhile, concurrently, an instruction decode and operand fetch (IDOF) isperformed for processing data from channel 3 1360 a, an Execute 1 (EX1)task is performed for processing data from channel 2 1355 a, and anExecute 2 (EX2) task is performed for processing data from channel 11350 a. One of ordinary skill in the art would appreciate that, becausechannels are dynamically generated, the channel numbering may notreflect the actual location and assignment of a task. Channel numberinghere is used to simply indicate the concept of pipelining acrossmultiple channels and not to represent actual task locations.

[0158] The pipeline architecture of the present invention is not limitedto instruction processing within PUs, but also exists on a PU to PUarchitecture level. As shown in FIG. 13b, multiple PUs may operate on adata set N in a pipeline fashion to complete the processing of aplurality of tasks where each task comprises a plurality of steps. Afirst PU 1305 b may be capable of performing echo cancellationfunctions, labeled task A. A second PU 1310 b may be capable ofperforming tone signaling functions, labeled task B. A third PU 1315 bmay be capable of performing a first set of encoding functions, labeledtask C. A fourth PU 1320 b may be capable of performing a second set ofencoding functions, labeled task D. In time slot 1 1350 b, the first PU1305 b performs task A1 1380 b on data set N. In time slot 2 1355 b, thefirst PU 1305 b performs task A2 1381 b on data set N and the second PU1310 b performs task B1 1387 b on data set N. In time slot 3 1360 b, thefirst PU 1305 b performs task A3 1382 b on data set N, the second PU1310 b performs task B2 1388 b on data set N, and the third PU 1315 bperforms task C1 1394 b on data set N. In time slot 4 1365 b, the firstPU 1305 b performs task A4 1383 b on data set N, the second PU 1310 bperforms task B3 1389 b on data set N, the third PU 1315 b performs taskC2 1395 b on data set N, and the fourth PU 1320 b performs task D1 1330on data set N. In time slot 5 1370 b, the first PU 1305 b performs taskA5 1384 b on data set N, the second PU 1310 b performs task B4 1390 b ondata set N, the third PU 1315 b performs task C3 1396 b on data set N,and the fourth PU 1320 b performs task D2 1331 on data set N. In timeslot 6 1375 b, the first PU 1305 b performs task A5 1385 b on data setN, the second PU 1310 b performs task B4 1391 b on data set N, the thirdPU 1315 b performs task C3 1397 b on data set N, and the fourth PU 1320b performs task D2 1332 on data set N. One of ordinary skill in the artwould appreciate how the pipeline processing would further progress.

[0159] In this exemplary embodiment, the combination of specialized PUswith a pipeline architecture enables the processing of greater channelson a single media layer. Where each channel implements a G.711 codec and128 ms of echo tail cancellation with DTMF detection/generation, voiceactivity detection (VAD), comfort noise generation (CNG), and calldiscrimination, the media engine layer operates at 1.95 MHz per channel.The resulting channel power consumption is at or about 6 mW per channelusing 0.13μ standard cell technology.

[0160] Packet Engine

[0161] The Packet Engine of the present invention is a communicationsprocessor that, in a preferred embodiment, supports the plurality ofinterfaces and protocols used in media gateway processing systemsbetween circuit-switched networks, packet-based IP networks, andcell-based ATM networks. The Packet Engine comprises a uniquearchitecture capable of providing a plurality of functions for enablingmedia processing, including, but not limited to, cell and packetencapsulation, quality of service functions for traffic management andtagging for the delivery of other services and multi-protocol labelswitching, and the ability to bridge cell and packet networks.

[0162] Referring now to FIG. 14, an exemplary architecture of the PacketEngine 1400 is provided. In the embodiment depicted, the Packet Engine1400 is configured to handle data rate up to and around OC-12. It isappreciated by one of ordinary skill in the art that certainmodifications can be made to the fundamental architecture to increasethe data handling rates beyond OC-12. The Packet Engine 1400 comprises aplurality of processors 1405, a host processor 1430, an ATM engine 1440,in-bound DMA channel 1450, out-bound DMA channel 1455, a plurality ofnetwork interfaces 1460, a plurality of registers 1470, memory 1480, aninterface to external memory 1490, and a means to receive control andsignaling information 1495.

[0163] The processors 1405 comprise an internal cache 1407, centralprocessing unit interface 1409, and data memory 1411. In a preferredembodiment, the processors 1405 comprise 32-bit reduced instruction setcomputing (RISC) processors with a 16 Kb instruction cache and a 12 Kblocal memory. The central processing unit interface 1409 permits theprocessor 1405 to communicate with other memories internal to, andexternal to, the Packet Engine 1400. The processors 1405 are preferablycapable of handling both in-bound and out-bound communication traffic.In a preferred implementation, generally half of the processors handlein-bound traffic while the other half handle out-bound traffic. Thememory 1411 in the processor 1405 is preferably divided into a pluralityof banks such that distinct elements of the Packet Engine 1400 canaccess the memory 1411 independently and without contention, therebyincreasing overall throughput. In a preferred embodiment, the memory isdivided into three banks, such that the in-bound DMA channel can writeto memory bank one, while the processor is processing data from memorybank two, while the out-bound DMA channel is transferring processedpackets from memory bank three.

[0164] The ATM engine 1440 comprises two primary subcomponents, referredto herein as the ATMRx Engine and the ATMTx Engine. The ATMRx Engineprocesses an incoming ATM cell header and transfers the cell forcorresponding AAL protocol, namely AAL1, AAL2, AAL5, processing in theinternal memory or to another cell manager, if external to the system.The ATMTx Engine processes outgoing ATM cells and requests the outboundDMA channel to transfer data to a particular interface, such as theUTOPIAII/POSII interface. Preferably, it has separate blocks of localmemory for data exchange. The ATM engine 1440 operates in combinationwith data memory 1483 to map an AAL channel, namely AAL2, to acorresponding channel on the TDM bus (where the Packet Engine 1400 isconnected to a Media Engine) or to a corresponding IP channel identifierwhere internetworking between IP and ATM systems is required. Theinternal memory 1480 utilizes an independent block to maintain aplurality of tables for comparing and/or relating channel identifierswith virtual path identifiers (VPI), virtual channel identifiers (VCI),and compatibility identifiers (CID). A VPI is an eight-bit field in theATM cell header which indicates the virtual path over which the cellshould be routed. A VCI is the address or label of a virtual channelcomprised of a unique numerical tag, defined by a 16 bit field in theATM cell header, that identifies a virtual channel over which a streamof cells is to travel during the course of a session between devices.The plurality of tables are preferably updated by the host processor1430 and are shared by the ATMRx and ATMTx engines.

[0165] The host processor 1430 is preferably a RISC processor with aninstruction cache 1431. The host processor 1430 communicates with otherhardware blocks through a CPU interface 1432 which is capable ofmanaging communications with Media Engines over a bus, such as a PCIbus, and with a host, such as a signaling host through a PCI-PCI bridge.The host processor 1430 is capable of being interrupted by otherprocessors 1405 through their transmission of interrupts which arehandled by an interrupt handler 1433 in the CPU interface. It is furtherpreferred that the host processor 1430 be capable of performing thefollowing functions: 1) boot-up processing, including loading code froma flash memory to an external memory and starting execution,initializing interfaces and internal registers, acting as a PCI host,and appropriately configuring them, and setting up inter-processorcommunications between a signaling host, the packet engine itself, andmedia engines, 2) DMA configuration, 3) certain network managementfunctions, 4) handling exceptions, such as the resolution of unknownaddresses, fragmented packets, or packets with invalid headers, 4)providing intermediate storage of tables during system shutdown, 5) IPstack implementation, and 6) providing a message-based interface forusers external to the packet engine and for communicating with thepacket engine through the control and signaling means, among others.

[0166] In an embodiment, two DMA channels are provided for data exchangebetween different memory blocks via data buses. Referring to FIG. 14,the in-bound DMA channel 1450 is utilized to handle incoming traffic tothe Packet Engine 1400 data processing elements and the out-bound DMAchannel 1455 is utilized to handle outgoing traffic to the plurality ofnetwork interfaces 1460. The in-bound DMA channel 1450 handles all ofthe data coming into the Packet Engine 1400.

[0167] To receive and transmit data to ATM and IP networks, the PacketEngine 1400 has a plurality of network interfaces 1460 that permit thePacket Engine to compatibly communicate over networks. Referring to FIG.15, in a preferred embodiment, the network interfaces comprise a GMIIPHY interface 1562, a GMII MAC interface 1564, and two UTOPIAII/POSIIinterfaces 1566 in communication with 622 Mbps ATM/SONET connections1568 to receive and transmit data. For IP-based traffic, the PacketEngine [not shown] supports MAC and emulates PHY layers of the Ethernetinterface as specified in IEEE 802.3. The gigabit Ethernet MAC 1570comprises FIFOs 1503 and a control state machine 1525. The transmit andreceive FIFOs 1503 are provided for data exchange between the gigabitEthernet MAC 1570 and bus channel interface 1505. The bus channelinterface 1505 is in communication with the outbound DMA channel 1515and in-bound DMA channel 1520 through bus channel. When IP data is beingreceived from the GMII MAC interface 1564, the MAC 1570 preferably sendsa request to the DMA 1520 for data movement. Upon receiving the request,the DMA 1520 preferably checks the task queue [not shown] in the MACinterface 1564 and transfers the queued packets. In a preferredembodiment, the task queue in the MAC interface is a set of 64 bitregisters containing a data structure comprising: length of data, sourceaddress, and destination address. Where the DMA 1520 is maintaining thewrite pointers for the plurality of destinations [not shown], thedestination address will not be used. The DMA 1520 will move the dataover the bus channel to memories located within the processors and willwrite the number of tasks at a predefined memory location. Aftercompleting writing of all tasks, the DMA 1520 will write the totalnumber of tasks transferred to the memory page. The processor willprocess the received data and will write a task queue for an outboundchannel of the DMA. The outbound DMA channel 1515 will check the numberof frames present in the memory locations and, after reading the taskqueue, will move the data either to a POSII interface of the MediaEngine Type I or II or to an external memory location where IP to ATMbridging is being performed.

[0168] For ATM only or ATM and IP traffic in combination, the PacketEngine supports two configurable UTOPIAII/POSII interfaces 1566 whichprovides an interface between the PHY and upper layer for IP/ATMtraffic. The UTOPIAII/POSII 1580 comprises FIFOs 1504 and a controlstate machine 1526. The transmit and receive FIFOs 1504 are provided fordata exchange between the UTOPIAII/POSII 1580 and bus channel interface1506. The bus channel interface 1506 is in communication with theoutbound DMA channel 1515 and in-bound DMA channel 1520 through buschannel. The UTOPIA II/POS II interfaces 1566 may be configured ineither UTOPIA level II or POS level II modes. When data is received onthe UTOPIAII/POSII interface 1566, data will push existing tasks in thetask queue forward and request the DMA 1520 to move the data. The DMA1520 will read the task queue from the UTOPIAII/POSII interface 1566which contains a data structure comprising: length of data, sourceaddress, and type of interface. Depending upon the type of interface,e.g. either POS or UTOPIA, the in-bound DMA channel 1520 will send thedata either to the plurality of processors [not shown] or to the ATMRxengine [not shown]. After data is written into the ATMRx memory, it isprocessed by the ATM engine and passed to the corresponding AAL layer.On the transmit side, data is moved to the internal memory of the ATMTxengine [not shown] by the respective AAL layer. The ATMTx engine insertsthe desired ATM header at the beginning of the cell and will request theoutbound DMA channel 1515 to move the data to the UTOPIAII/POSIIinterface 1566 having a task queue with the following data structure:length of data and source address.

[0169] Referring to FIG. 16, to facilitate control and signalingfunctions, the Packet Engine 1600 has a plurality of PCI interfaces1605, 1606, referred to in FIG. 14 as 1495. In a preferred embodiment, asignaling host 1610, through an initiator 1612, sends messages to bereceived by the Packet Engine 1600 to a PCI target 1605 via acommunication bus 1617. The PCI target further communicates thesemessages through a PCI to PCI bridge 1620 to a PCI initiator 1606. ThePCI initiator 1606 sends messages through a communication bus 1618 to aplurality of Media Engines 1650, each having a memory 1660 with a memoryqueue 1665.

[0170] Software Architecture

[0171] As previously discussed, operating on the above-describedhardware architecture embodiments is a plurality of novel, integratedsoftware systems designed to enable media processing, signaling, andpacket processing. The novel software architecture enables the logicalsystem, presented in FIG. 5, to be physically deployed in a number ofways, depending on processing needs.

[0172] Communication between any two modules, or components, in thesoftware system is facilitated by application program interfaces (APIs)that remain substantially constant and consistent irrespective ofwhether the software components reside on a hardware element or acrossmultiple hardware elements. This permits the mapping of components ontodifferent processing elements, thereby modifying physical interfaces,without the concurrent modification of the individual components.

[0173] In an exemplary embodiment, shown in FIG. 17, a first component1705 operates in conjunction with a second component 1710 and a thirdcomponent 1715 through a first interface 1720 and second interface 1725,respectively. Because all three components 1705, 1710, 1715 areexecuting on the same physical processor 1700, the first interface 1720and second interface 1725 perform interfacing tasks through functionmapping conducted via the APIs of each of the three components 1705,1710, 1715. Referring to FIG. 17a, where the first 1705 a, second 1710a, and third 1715 a components reside on separate hardware elements 1700a, 1701 a, 1702 a respectively, e.g. separate processors or processingelements, the first interface 1720 a and second interface 1725 aimplement interfacing tasks through queues 1721 a, 1726 a in sharedmemory. While the interfaces 1720 a, 1725 a are no longer limited tofunction mapping and messaging, the components 1705 a, 1710 a, 1715 acontinue to use the same APIs to conduct inter-component communication.The consistent use of a standard API enables the porting of variouscomponents to different hardware architectures in a distributedprocessing environment by relying on modified interfaces or driverswhere necessary and without modifications in the components themselves.

[0174] Referring now to FIG. 18, a logical division of the softwaresystem 1800 is shown. The software system 1800 is divided into threesubsystems, a Media Processing Subsystem 1805, a Packetization Subsystem1840, and a Signaling/Management Subsystem (hereinafter referred to asthe Signaling Subsystem) 1870. The Media Processing Subsystem 1805 sendsencoded data to the Packetization Subsystem 1840 for encapsulation andtransmission over the network and receives network data from thePacketization Subsystem 1840 to be decoded and played out. The SignalingSubsystem 1870 communicates with the Packetization Subsystem 1840 to getstatus information such as the number of packets transferred, to monitorthe quality of service, control the mode of particular channels, amongother functions. The Signaling Subsystem 1870 also communicates with thePacketization Subsystem 1840 to control establishment and destruction ofpacketization sessions for the origination and termination of calls.Each subsystem 1805, 1840, 1870 further comprises a series of components1820 designed to perform different tasks in order to effectuate theprocessing and transmission of media. Each of the components 1820conducts communications with any other module, subsystem, or systemthrough APIs that remain substantially constant and consistentirrespective of whether the components reside on a hardware element oracross multiple hardware elements, as previously discussed.

[0175] In an exemplary embodiment, shown in FIG. 19, the MediaProcessing Subsystem 1905 comprises a system API component 1907, mediaAPI component 1909, real-time media kernel 1910, and voice processingcomponents, including line echo cancellation component 1911, componentsdedicated to performing voice activity detection 1913, comfort noisegeneration 1915, and discontinuous transmission management 1917, acomponent 1919 dedicated to handling tone signaling functions, such asdual tone (DTMF/MF), call progress, call waiting, and calleridentification, and components for media encoding and decoding functionsfor voice 1927, fax 1929, and other data 1931.

[0176] The system API component 1907 should be capable of providing asystem wide management and enabling the cohesive interaction ofindividual components, including establishing communications betweenexternal applications and individual components, managing run-timecomponent addition and removal, downloading code from central servers,and accessing the MIBs of components upon request from-other components.The media API component 1909 interacts with the real time media kernel1910 and individual voice processing components. The real time mediakernel 1910 allocates media processing resources, monitors resourceutilization on each media-processing element, and performs loadbalancing to substantially maximize density and efficiency.

[0177] The voice processing components can be distributed acrossmultiple processing elements. The line echo cancellation component 1911deploys adaptive filter algorithms to remove from a signal echoes thatmay arise as a result of the reflection and/or retransmission ofmodified input signals back to the originator of the input signals. Inone preferred embodiment, the line echo cancellation component 1911 hasbeen programmed to implement the following filtration approach: Anadaptive finite impulse response (FIR) filter of length N is convergedusing a convergence process, such as a least means square approach. Theadaptive filter generates a filtered output by obtaining individualsamples of the far-end signal on a receive path, convolving the sampleswith the calculated filter coefficients, and then subtracting, at theappropriate time, the resulting echo estimate from the received signalon the transmit channel. With convergence complete, the filter is thenconverted to an infinite impulse response (IIR) filter using ageneralization of the ARMA-Levinson approach. In the course ofoperation, data is received from an input source and used to adapt thezeroes of the IIR filter using the LMS approach, keeping the polesfixed. The adaptation process generates a set of converged filtercoefficients that are then continually applied to the input signal tocreate a modified signal used to filter the data. The error between themodified signal and actual signal received is monitored and used tofurther adapt the zeroes of the IIR filter. If the measured error isgreater than a pre-determined threshold, convergence is re-initiated byreverting back to the FIR convergence step.

[0178] The voice activity detection component 1913 receives incomingdata and determines whether voice or another type of signal, i.e. noise,is present in the received data, based upon an analysis of certain dataparameters. The comfort noise generation component 1915 operates to senda Silence Insertion Descriptor (SID) containing information that enablesa decoder to generate noise corresponding to the background noisereceived from the transmission. An overlay of audible but non-obtrusivenoise has been found to be valuable in helping users discern whether aconnection is live or dead. The SID frame is typically small, i.e.approximately 15 bits under the G.729 B codec specification. Preferably,updated SID frames are sent to the decoder whenever there has beensufficient change in the background noise.

[0179] The tone signaling component 1919, including recognition ofDTMF/MF, call progress, call waiting, and caller identification,operates to intercept tones meant to signal a particular activity orevent, such as the conducting of two-stage dialing (in the case of DTMFtones), the retrieval of voice-mail, and the reception of an incomingcall (in the case of call waiting), and communicate the nature of thatactivity or event in an intelligent manner to a receiving device,thereby avoiding the encoding of that tone signal as another element ina voice stream. In one embodiment, the tone-signaling component 1919 iscapable of recognizing a plurality of tones and, therefore, when onetone is received, send a plurality of RTP packets that identify thetone, together with other indicators, such as length of the tone. Bycarrying the occurrence of an identified tone, the RTP packets conveythe event associated with the tone to a receiving unit. In a secondembodiment, the tone-signaling component 1919 is capable of generating adynamic RTP profile wherein the RTP profile carries informationdetailing the nature of the tone, such as the frequency, volume, andduration. By carrying the nature of the tone, the RTP packets convey thetone to the receiving unit and permit the receiving unit to interpretthe tone and, consequently, the event or activity associated with it.

[0180] Components for the media encoding and decoding functions forvoice 1927, fax 1929, and other data 1931, referred to as codecs, aredevised in accordance with International Telecommunications Union (ITU)standard specifications, such as G.711 for the encoding and decoding ofvoice, fax, and other data. An exemplary codec for voice, data, and faxcommunications is ITU standard G.711, often referred to as pulse codemodulation. G.711 is a waveform codec with a sampling rate of 8,000 Hz.Under uniform quantization, signal levels would typically require atleast 12 bits per sample, resulting in a bit rate of 96 kbps. Undernon-uniform quantization, as is commonly used, signal levels requireapproximately 8 bits per sample, leading to a 64 kbps rate. Other voicecodecs include ITU standards G.723.1, G.726, and G.729 A/B/E, all ofwhich would be known and appreciated by one of ordinary skill in theart. Other ITU standards supported by the fax media processing component1929 preferably include T.38 and standards falling within V.xx, such asV.17, V.90, and V.34. Exemplary codecs for fax include ITU standard T.4and T.30. T.4 addresses the formatting of fax images and theirtransmission from sender to receiver by specifying how the fax machinescans documents, the coding of scanned lines, the modulation schemeused, and the transmission scheme used. Other codecs include ITUstandards T.38.

[0181] Referring to FIG. 20, in an exemplary embodiment, thePacketization Subsystem 2040 comprises a system API component 2043,packetization API component 2045, POSIX API 2047, real-time operatingsystem (RTOS) 2049, components dedicated to performing such quality ofservice functions as buffering and traffic management 2050, a componentfor enabling IP communications 2051, a component for enabling ATMcommunications 2053, a component for resource-reservation protocol(RSVP) 2055, and a component for multi-protocol label switching (MPLS)2057. The Packetization Subsystem 2040 facilitates the encapsulation ofencoded voice/data into packets for transmission over ATM and IPnetworks, manages certain quality of service elements, including packetdelay, packet loss, and jitter management, and implements trafficshapingto control network traffic. The packetization API component 2045provides external applications facilitated access to the PacketizationSubsystem 2040 by communicating with the Media Processing Subsystem [notshown] and Signaling Subsystem [not shown].

[0182] The POSIX API 2047 layer isolated the operating system (OS) fromthe components and provides the components with a consistent OS API,thereby insuring that components above this layer do not have to bemodified if the software is ported to another OS platform. The RTOS 2049acts as the OS facilitating the implementation of software code intohardware instructions.

[0183] The IP communications component 2051 supports packetization forTCP/IP, UDP/IP, and RTP/RTCP protocols. The ATM communications component2053 supports packetization for AAL1, AAL2, and AAL5 protocols. It ispreferred that the RTP/UDP/IP stack be implemented on the RISCprocessors of the Packet Engine. A portion of the ATM stack is alsopreferably implemented on the RISC processors with more computationallyintensive parts of the ATM stack implemented on the ATM engine.

[0184] The component for RSVP 2055 specifies resource-reservationtechniques for IP networks. The RSVP protocol enables resources to bereserved for a certain session (or a plurality of sessions) prior to anyattempt to exchange media between the participants. Two levels ofservice are generally enabled, including a guaranteed level whichemulates the quality achieved in conventional circuit switched networks,and controlled load which is substantially equal to the level of serviceachieved in a network under best-effort and no-load conditions. Inoperation, a sending unit issues a PATH message to a receiving unit viaa plurality of routers. The PATH message contains a trafficspecification (Tspec) that provides details about the data that thesender expects to send, including bandwidth requirement and packet size.Each RSVP-enabled router along the transmission path establishes a pathstate that includes the previous source address of the PATH message (theprior router). The receiving unit responds with a reservation request(RESV) that includes a flow specification having the Tspec andinformation regarding the type of reservation service requested, such ascontrolled-load or guaranteed service. The RESV message travels back, inreverse fashion, to the sending unit along the same router pathway. Ateach router, the requested resources are allocated, provided suchresources are available and the receiver has authority to make therequest. The RESV eventually reaches the sending unit with aconfirmation that the requisite resources have been reserved.

[0185] The component for MPLS 2057 operates to mark traffic at theentrance to a network for the purpose of determining the next router inthe path from source to destination. More specifically, the MPLS 2057component attaches a label containing all of the information a routerneeds to forward a packet to the packet in front of the IP header. Thevalue of the label is used to look up the next hop in the path and thebasis for the forwarding of the packet to the next router. ConventionalIP routing operates similarly, except the MPLS process searches for anexact match, not the longest match as in conventional IP routing.

[0186] One function that could be provided in either the MediaProcessing Subsystem or the Packetization Subsystem is jitter buffermanagement. As previously discussed, an embodiment of the presentinvention operates by estimating a packet delay histogram that may beused to determine the required buffer size and minimum delay. Thepreferred method of determining the buffer size and minimum delaycomprises the selection of an area of the histogram, the calculation ofthe mean delay based upon the selected area, the calculation of aplurality of variances based upon the mean delay, and the use of thevariances to determine buffer size and minimum delay.

[0187] Referring back to FIG. 1f, the graph represents histogram 101 fof a packet stream received by a media gateway, more specifically theMedia Processing Subsystem or Packetization Subsystem. The x-axis 102 frepresents the delay experienced by packets and the y-axis 103 frepresents the number of packet samples received. The vertical bars 104f show the number of packets received in a defined span of time. A curve105 f connects the central point of tops of the bars 104 f of thehistogram 101 f. The curve 105 f depicts the distribution of the arrivaltime of packets.

[0188] As previously discussed, to avoid skewing the peak, or meandelay, calculation, the tail is eliminated at a defined point 106 f,which in this example is 270 ms on the x-axis 102 f. Therefore, thehistogram area to the right of point 106 f is discarded. The mean of thecurve 107 f may be calculated by using the formula:$M = \frac{\Sigma \quad x_{i}}{N}$

[0189] where M is the mean, x_(i) represents the delay experienced bypackets arriving in a particular window of time i, and N is the totalnumber of samples.

[0190] The preferred embodiment of the invention utilizes at least twoseparately calculated variances to better estimate the buffer size anddelay based upon the estimated histogram. To calculate the plurality ofvariances, the histogram is conceptually divided into two portions, aportion encompassing the packets arriving after the mean delay and aportion encompassing packets that arrived prior to the mean delay. Wherei packets have been received and the mean delay is associated withpacket m, then the two histogram portions are defined by D₀ to D_(m−1)and the second defined by D_(m+1) to D_(i), or the final packet. Thevariance of D₀ to D_(m−1), Var₁, may be calculated using the formula:${Var}_{1} = {{\frac{{\Sigma \left( {x_{j} - M} \right)}^{2}}{\left( N_{{0\quad {to}\quad m} - 1} \right)}\quad {or}\quad {Var}} = \frac{\left. \Sigma \middle| {x_{j} - M} \right|}{\left( N_{{0\quad {to}\quad m} - 1} \right)}}$

[0191] where j extends from 0 to m−1 and the total number of samplesincludes those samples from 0 to m−1. Similarly, the variance of D_(m+1)to D₁, Var₂, may be calculated using the formula:${Var}_{2} = {{\frac{{\Sigma \left( {x_{j} - M} \right)}^{2}}{\left( N_{m + {1\quad {to}\quad i}} \right)}\quad {or}\quad {Var}} = \frac{\left. \Sigma \middle| {x_{j} - M} \right|}{\left( N_{m + {1\quad {to}\quad i}} \right)}}$

[0192] where j extends from m+1 to i and the total number of samplesincludes those sample from m+1 to i. Although the two separatelycalculated variances are calculated using one sample set of packetsarriving before the mean delay and one sample set of packets arrivingafter the mean delay, one would appreciate that the sample set ofpackets can be calculated using sample sets that overlap or that, whentaken together, comprise a subset of packets received.

[0193] Typically, the two variances are not equal because the histogramis asymmetrical. As shown in FIG. 1f, Var₁ 115 f is less than Var₂ 117f, reflective of the asymmetrical nature of the histogram and betterapproximating the actual distribution of packets received. This approachtherefore represents an improved approach to ascertaining the size andplacement of the buffer more accurately while optimizing computationalresources.

[0194] Optionally, Var₁ can be calculated from Var₂, or vice versa,using pre-defined equations. As an example, Var₁ could be a multiple orfactor of Var₂, i.e. Var₁*C=Var₂, where C is a constant that isdetermined experimentally. Alternatively, Var₁ could be a fixed valuedepending on whether Var₂ exceeds or does not exceed certain thresholdvalue.

[0195] After the peak and variances are calculated, the buffer size andtiming can be determined. The buffer starts accepting packets at delayd, which is determined by subtracting Var₁ 115 f from the mean 107 f.

d=M−Var ₁

[0196] and continues accepting for a period (T) which is the sum of thetwo variances.

T=Var ₁ +Var ₂

[0197] For example, where the Var₁ is 60 ms, Var₂ is 105 ms and the meanis 150 ms, the buffer starts accepting packets at 90 ms and continuesaccepting for period T of 165 ms, or up to 255 ms. The variances used todetermine the buffer parameters can also be calculated variances derivedby multiplying Var₁ and/or Var₂ by a multiplier (k) where the multiplieris any number, but preferably in the range of 2-8, and more preferablyaround 2, 4 or 8. Utilizing this approach, the Media ProcessingSubsystem or Packetization Subsystem is better able to manage jitter inpackets received by the Media Gateway system.

[0198] Referring to FIG. 21, in an exemplary embodiment, the SignalingSubsystem 2170 comprises a user application API component 2173, systemAPI component 2175, POSIX API 2177, real-time operating system (RTOS)2179, a signaling API 2181, components dedicated to performing suchsignaling functions as signaling stacks for ATM networks 2183 andsignaling stacks for IP networks 2185, and a network managementcomponent 2187. The signaling API 2181 provides facilitated access tothe signaling stacks for ATM networks 2183 and signaling stacks for IPnetworks 2185. The signaling API 2181 comprises a master gateway andsub-gateways of N number. A single master gateway can have Nsub-gateways associated with it. The master gateway performs thedemultiplexing of incoming calls arriving from an ATM or IP network androutes the calls to the sub-gateway that has resources available. Thesub-gateways maintain the state machines for all active terminations.The sub-gateways can be replicated to handle many terminations. Usingthis design, the master gateway and sub-gateways can reside on a singleprocessor or across multiple processors, thereby enabling thesimultaneous processing of signaling for a large number of terminationsand the provision of substantial scalability.

[0199] The user application API component 2173 provides a means forexternal applications to interface with the entire software system,comprising each of the Media Processing Subsystem, PacketizationSubsystem, and Signaling Subsystem. The network management component2187 supports local and remote configuration and network managementthrough the support of simple network management protocol (SNMP). Theconfiguration portion of the network management component 2187 iscapable of communicating with any of the other components to conductconfiguration and network management tasks and can route remote requestsfor tasks, such as the addition or removal of specific components.

[0200] The signaling stacks for ATM networks 2183 include support forUser Network Interface (UNI) for the communication of data using AAL1,AAL2, and AAL5 protocols. User Network Interface comprisesspecifications for the procedures and protocols between the gatewaysystem, comprising the software system and hardware system, and an ATMnetwork. The signaling stacks for IP networks 2185 include support for aplurality of accepted standards, including media gateway controlprotocol (MGCP), H.323, session initiation protocol (SIP), H.248, andnetwork-based call signaling (NCS). MGCP specifies a protocol converter,the components of which may be distributed across multiple distinctdevices. MGCP enables external control and management of datacommunications equipment, such as media gateways, operating at the edgeof multi-service packet networks. H.323 standards define a set of callcontrol, channel set up, and codec specifications for transmitting realtime voice and video over networks that do not necessarily provide aguaranteed level of service, such as packet networks. SIP is anapplication layer protocol for the establishment, modification, andtermination of conferencing and telephony sessions over an IP-basednetwork and has the capability of negotiating features and capabilitiesof the session at the time the session is established. H.248 providesrecommendations underlying the implementation of MGCP.

[0201] To further enable ease of scalability and implementation, thepresent software method and system does not require specific knowledgeof the processing hardware being utilized. Referring to FIG. 22, in atypical embodiment, a host application 2205 interacts with a DSP 2210via an interrupt capability 2220 and shared memory 2230. As shown inFIG. 23, the same functionality can be achieved by a simulationexecution through the operation of a virtual DSP program 2310 as aseparate independent thread on the same processor 2315 as theapplication code 2320. This simulation run is enabled by a task queuemutex 2330 and a condition variable 2340. The task queue mutex 2330protects the data shared between the virtual DSP program 2310 and aresource manager [not shown]. The condition variable 2340 allows theapplication to synchronize with the virtual DSP 2310 in a manner similarto the function of the interrupt 2220 in FIG. 22.

[0202] The present methods and systems provide for an improved jitterbuffer management method and system by basing playout buffer adjustmentson computed minimum delays and buffer sizes with reference to aplurality of variances derived from an estimated histogram. Whilevarious embodiments of the present invention have been shown anddescribed, it would be apparent to those skilled in the art that manymodifications are possible without departing from the inventive conceptdisclosed herein. For example, it would be apparent that the pluralityof variances can be calculated by determining a first variance from anestimated histogram and then deriving subsequent variances through anypre-defined equation incorporating the first variance.

What is claimed is:
 1. A system for smoothing jitter experienced by data packets in transmission from a transmitter to a receiver, comprising: a delay estimator adapted to estimate an adaptive packet delay histogram, having a mean, representative of the delays experienced by data packets in transmission from a transmitter to a receiver; a playout delay evaluator in communication with the delay estimator and adapted to calculate a playout time, wherein the calculation of said playout time utilizes said mean and a first variance derived from a portion of said packet delay histogram; and a playout buffer monitor adapted to buffer the data packets for the delay amount determined by the playout delay evaluator and then output the delayed data packets.
 2. The system of claim 1, wherein the delay is calculated by subtracting the first variance from a mean delay experienced by data packets in transmission from a transmitter to a receiver.
 3. The system of claim 2, wherein the variance is calculated based upon a portion of the histogram that is less than the mean delay.
 4. The system of claim 1, wherein the first variance is calculated using a second variance calculated from a portion of the histogram that differs from the portion used to derive the first variance.
 5. The system of claim 1, further comprising a delay smoother to control changes in playout time.
 6. The system of claim 1, wherein the playout time is further controlled by expanding increases in playout time and limiting decreases in playout time.
 7. A method for substantially reducing jitter experienced by data packets in transmission from a transmitter to a receiver, comprising: estimating a mean delay using a packet delay histogram representative of the delays experienced by data packets in transmission from a transmitter to a receiver; deriving a first variance from a first portion of said histogram; deriving a second variance from a second portion of said histogram, wherein said first portion and second portion are not identical; setting a delay equal to a function of the mean delay and the first variance; setting a buffer size equal to a function of the first and second variance; and buffering data packets in accordance with said buffer size and delay.
 8. The method of claim 7, wherein the delay is equal to the mean delay minus the first variance.
 9. The method of claim 7m wherein the buffer size is equal to the sum of the first and second variances.
 10. A method for substantially reducing jitter experienced by data packets in transmission from a transmitter to a receiver, comprising: estimating a mean delay using a packet delay histogram representative of the delays experienced by data packets in transmission from a transmitter to a receiver; deriving a first variance from a first portion of said histogram; deriving a second variance as a function of the first variance; setting a delay equal to a function of the mean delay and the first variance; setting a buffer size equal to a function of the first and second variance; and buffering data packets in accordance with said buffer size and minimum delay.
 11. The method of claim 10, wherein the second variance is equal to the first variance multiplied by a constant.
 12. The method of claim 10, wherein the second variance is equal to a constant minus the first variance.
 13. A system for smoothing jitter experienced by data packets in transmission from a transmitter to a receiver, comprising: a delay estimator for estimating a packet delay histogram representative of the delays experienced by data packets in transmission from a transmitter to a receiver; and a playout buffer monitor having a buffer size equal to the sum of a first variance and a second variance, wherein the first variance is calculated from a first portion of said packet delay histogram and the second variance is calculated from a second portion of said packet delay histogram, and wherein said playout buffer monitor buffers the data packets for a minimum delay amount determined by the first variance.
 14. A system for managing jitter experienced by data packets in transmission from a transmitter to a receiver, comprising: a delay estimator for estimating a packet delay histogram and a mean delay experienced by data packets in transmission from a transmitter to a receiver; and a playout delay evaluator in communication with the delay estimator and adapted to determine a plurality of variances based upon a plurality of portions of the packet delay histogram, wherein the calculation of a first variance is used to determine a delay and the calculation of a second variance is used to determine a buffer size; and a playout buffer monitor having the calculated buffer size wherein the playout buffer monitor buffers the data packets selected by the playout delay evaluator for the delay and then outputs the delayed data packets.
 15. A media processing system for transmitting, receiving, and processing media across networks wherein the media processing system has substantially reduced jitter experienced by data packets in transmission from a transmitter to a receiver, comprising: a plurality of media processors wherein the media processor is capable of processing media; a plurality of packet processors in communication with at least one of said media processors wherein the packet processor is capable of packetizing processed media; a host processor in communication with at least one said packet or media processors; and a playout buffer, implemented in either the media processor or packet processor, having a buffer size equal to a function of a first variance and a second variance and using a delay equal to a function of a mean delay and the first variance wherein said mean delay, first variance and said second variance are determined from a packet delay histogram representative of delays experienced by data packets in transmission from a transmitter to a receiver.
 16. The system of claim 15, wherein the second variance is equal to a function of the first variance and a constant.
 17. The system of claim 15, wherein the delay is calculated by subtracting the first variance from the mean delay.
 18. The system of claim 15, wherein the first variance is derived from a portion of the histogram that is less than the mean delay.
 19. The system of claim 15, wherein the second variance is derived from a portion of the histogram that differs from the portion used to derive the first variance.
 20. A media processing system for transmitting, receiving, and processing media across networks, comprising: a plurality of media processors, each of said media processors having a plurality of processing layers wherein each processing layer has at least one processing unit, at least one program memory, and at least one data memory, each of said processing unit, program memory, and data memory being in communication with one another; a plurality of packet processors in communication with at least one of said media processors wherein each of said packet processors is capable of packetizing processed media; a host processor in communication with at least one of said plurality of packet processors or at least one of said plurality of media processors; and a playout buffer, implemented in either the at least one of said plurality of packet processors or at least one of said plurality of media processors, having a buffer size equal to a function of a first variance and a second variance wherein said first variance and said second variance are determined from a packet delay histogram representative of delays experienced by data packets in transmission from a transmitter to a receiver.
 21. The media processing system of claim 20, wherein at least one processing unit in at least one of said processing layers performs echo cancellation functions on received data, wherein at least one processing unit in at least one of said processing layers performs encoding or decoding functions on received data, and wherein a task scheduler is adapted to receive a plurality of tasks from a source and distribute said tasks to the processing layers. 